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encoding secreted proteins are disclosed. The 5' ESTs 
may be to obtain cDNAs and genomic DNAs corre- 
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Description 

Background of the Invention 

[0001] The estimated 50,000-100,000 genes scat- 
tered along the human chromosomes offer tremendous 
promise for the understanding, diagnosis, and treatment 
of human diseases. In addition, probes capable of spe- 
cifically hybridizing to loci distributed throughout the hu- 
man genome find applications in the construction of high 
resolution chromosome maps and in the identification 
of individuals. 

[0002] In the past, the characterization of even a sin- 
gle human gene was a painstaking process, requiring 
years of effort. Recent developments in the areas of 
cloning vectors, DNA sequencing, and computer tech- 
nology have merged to greatly accelerate the rate at 
which human genes can be Isolated, sequenced, 
mapped, and characterized. Cloning vectors such as 
yeast artificial chromosomes (YACs) and bacterial arti- 
ficial chromosomes (BACs) are able to accept DNA in- 
serts ranging from 300 to 1000 kilobases (kb) or 
100-400 kb In length respectively, therebyfacilitatingthe 
manipulatksn and ordering of DNA sequences distribut- 
ed over great distances on the human chromosomes. 
Automated DNA sequencing machines permit the rapid 
sequencing of human genes. Bioinformatics software 
enables the comparison of nucleic acid and protein se- 
quences, thereby assisting in the characterization of hu- 
man gene products. 

[0003] Currently, two different approaches are being 
pursued for identifying and characterizing the genes dis- 
tributed atong the human genome. In one approach, 
large fragments of genomic DNA are isolated, cloned, 
and sequenced. Potential open reading frames in these 
genomic sequences are identified using bioinformatics 
software. However, this approach entails sequencing 
large stretches of human DNA whfch do not encode pro- 
teins in order to find the protein encoding sequences 
scattered throughout the genome. In addrtton to requir- 
ing extensive sequencing, the bioinformatics software 
may mischaracterize the genomic sequences obtained. 
Thus, the software may produce false positives in which 
non-coding DNA is mischaracterized as coding DNA or 
false negatives in -which coding DNA is mislabeled as 
non-coding DNA 

[0004] An alternative approach takes a more direct 
route to identifying and characterizing human genes. In 
this approach, complementary DNAs (cDNAs) are syn- 
thesized from isolated messenger RNAs (mRNAs) 
whrch encode human proteins. Using this approach, se- 
quencing is only performed on DNA whch is derived 
from protein coding portions of the genome. Often, only 
short stretches of the cDNAs are sequenced to obtain 
sequences called expressed sequence tags (ESTs). 
The ESTs may then be used to isolate or purify extended • 
cDNAs which include sequences adjacent to the EST 
sequences. The extended cDNAs may contain all of the 



sequence of the EST which was used to obtain them or 
only a portion of the sequence of the EST which was 
used to obtain them. In addition, the extended cDNAs 
may contain the full coding sequence of the gene from 
5 which the EST was derived or, altematively, the extend- 
ed cDNAs may include portions of the coding sequence 
of the gene from which the EST was derived. It will be 
appreciated that there may be several extended cDNAs 
which include the EST sequence as a result of alternate 
10 splicing or the activity of alternative promoters. Alterna- 
tively, ESTs having partially overlapping sequences may 
be dentified and contigs comprising the consensus se- 
quences of the overlapping ESTs may be identified. 
[0005] In the past, these short EST sequences were 
?5 often obtained from oligo-dT primed cDNA libraries. Ac- 
cordingly, they mainly corresponded to the 3' untrans- 
lated region of the mRNA. In part, the prevalence of EST 
sequences derived from the 3* end of the mRNA is a 
result of the fact that typical techniques for obtaining cD- 
NAs, are not welt suited for isolating cDNA sequences 
derived from the 5* ends of mRNAs. (Adams et al.. Na- 
ture 377:3-174, 1996, Hillier et al.. Genome Res. 6: 
807-828,1996). 

[0006] In addition, in those reported instances where 
kDnger cDNA sequences have been obtained, the re- 
ported sequences typically correspond to coding se- 
quences and do not include the full 5* untranslated re- 
gk^n (5'UTR) of the mRNA from which the cDNA is de- 
rived. 5'UTRs are often involved in the regulation of 
gene expresston. by affecting either the stability or 
translation of mRNAs. Indeed, 5'LITRs may contain sev- 
eral features known to affect the initiation of translation: 
(i) the distance between the cap stojcture and the initi- 
ation codon, (ii) the presence of cis-acting elements 
which may be either linear sequences such as polypy- 
rimidine tracts (Kaspar et at, J. Biol. Chem, 267, 
508-514, 1992; Severson et aL, Eur J Biochem 229: 
426-32, 1995) or secondary structures such as IREs 
(Rouault and Klausner, Curr Top Celf Regul 35:1-19. 
1 997), and (ill) upstream open reading frames or uORFs 
(Geballe and Morris, Trends Biochem Set 19:159-64, 
1994). Thus, regulatbn of gene expression may be 
achieved through the use of alternative 5*UTRs, For in- 
stance, the translation of the tissue inhibitor of metallo- 
protease mRNA is enhanced in mitogenically activated 
cells through modification of the start codon of an uORF 
in its 5'UTR using an altemative promoter (Waterhouse 
et al, J Biol Chem. 265:5585-9. 1990). Furthemnore, 
modification of 5*UTR through mutation, insertkjn or 
translocation events may even be implied in pathogen- 
esis. For instance, the fragile X syndrome, the most 
common cause of inherited mental retardat bn, is partly 
due to an insertcn of multiple CGG trinucleotides in the 
5'UTR of the fragile X mRNA resulting in the inhibitkjn 
of protein synthesis via ribosome stalling (Feng et al, 
Sc/enc© 268:731 -4. 1995). An aberant mutatkDn in re- 
gk5ns of the 5'UTR known to inhibit translation of the pro- 
to-oncogen e c-myc was shown to result in upregulation 
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of C-myc protein levels in cells derived from patients 
with multiple myelomas (Willis et al, Curr Top Microbiol 
Immunol 224:269-76, 1997). However, the use of oligo- 
dT primed cDN A libraries does not allow the isolation of 
complete 5'UTRs since such obtained incomplete se- s 
quences may not include the first exon of the mRNA, 
particularly in situations where the first exon is short. 
Furthermore, they nnay not include some exons, often 
short ones, which are located upstream of splicing sites. 
Thus, there is a need to obtain sequences derived from io 
the 5' ends of mRNAs. 

[0007] While many sequences derived from human 
chromosomes have practical applications, approaches 
based on the identification and characterization of those 
chromosomal sequences which encode a protein prod- ^5 
uct are parttculariy relevant to diagnostic and therapeu- 
tic uses. In some instances, the sequences used in such 
therapeutic or diagnostic techniques may be sequences 
which encode proteins which are secreted from the cell 
in which they are synthesized, as well as the secreted 20 
proteins themselves, are particularly valuable as poten- 
tial therapeutic agents. Such proteins are often involved 
in cell to cell communication and may be responsible for 
producing a clinically relevant response in their target 
cells. In fact, several secretory proteins, including tissue 2S 
plasminogen activator, G-CSF, GM-CSF, erythropoietin, 
human growth hormone. Insulin, interferon-ct, interfer- 
on-p, interferon-Y, and interleukin-2. are currently in clin- 
ical use. These proteins are used to treat a wide range 
of conditions, including acute myocardial infarction, 30 
acute ischemic stroke, anemia, diabetes, growth hor- 
mone deficiency, hepatitis, kidney carcinoma, chemo- 
therapy-induced neutropenia and multiple sclerosis. For 
these reasons, extended cDNAs encoding secreted 
proteins or portions thereof represent a valuable source 3S 
of therapeutic agents. Thus, there is a need for the iden- 
tification and characterization of secreted proteins and 
the nucleic acids encoding them. 
[0008] In addition to being therapeutically useful 
themselves, secretory proteins include short peptkJes, 40 
called signal peptides, at their amino termini which direct 
their secretkxi. These signal peptides are encoded by 
the signal sequences kx:ated at the 5' ends of the coding 
sequences of genes encoding secreted proteins. These 
signal peptkJes can be used to direct the extracellular ^ 
secretfc>n of any protein to which they are operably 
linked. In additk)n, portions of the signal peptides called 
membrane-translocating sequences, may also be used 
to direct the intracellular import of a peptide or protein 
of interest. This may prove beneficial in gene therapy so 
strategies in which it is desired to deliver a particular 
gene product to cells other than the cell in which it is 
produced. Signal sequences encoding signal peptides 
also find application in simplifying protein purificatbn 
techniques. In such applications, the extracellular se- ss 
cretk)n of the desired protein greatly facilitates purifica- 
tion by reducing the number of undesired proteins from 
which the desired protein must be selected. Thus, there 



exists a need to identify and characterize the 5' portions 
of the genes for secretory proteins which encode signal 
peptides. 

[0009] Sequences coding for non-secreted proteins 
may also find application as therapeutics or diagnostics. 
In particular, such sequences may be used to determine 
whether an individual is likely to express a detectable 
phenotype, such as a disease, as a consequence of a 
mutation in the coding sequence for a non-secreted pro- 
tein or for a secreted protein. In instances where the in- 
dividual Is at risk of suffering from a disease or other 
undesirable phenotype as a result of a mutatkxi in such 
a coding sequence, the undesirable phenotype may be 
corrected by introducing a normal coding sequence us- 
ing gene therapy Alternatively, if the undesirable phe- 
notype results from overexpression of the protein en- 
coded by the coding sequence, expression of the pro- 
tein may be reduced using anttsense or triple helix 
based strategies. 

[001 0] The secreted or non-secreted human polypep- 
tides encoded by the coding sequences may also be 
used as therapeutics by administering them directly to 
an individual having a condition, such as a disease, re- 
sulting from a mutation in the sequence encoding the 
polypeptide. In such an Instance, the conditkDo can be 
cured or ameliorated by administering the polypeptkJe 
to the indivkJual. 

[0011] In additk^n, the secreted or non-secreted hu- 
man polypeptides or portbns thereof may be used to 
generate antibodies useful in detenmining the tissue 
type or species of origin of a bbtogical sample. The an- 
tibodies may also be used to determine the cellular lo- 
calization of the secreted or non-secreted human 
polypeptides or the cellular localizatk>n of polypeptides 
which have been fused to the human polypeptides. In 
addition, the antibodies may also be used in immunoaf- 
finity chromatography techniques to isolate, purify, or 
enrteh the hunr^n polypeptide or a target polypeptide 
which has been fused to the human polypeptkJe. 
[0012] Publk: information on the number of human 
genes for which the promoters and upstream regulatory 
regk^s have been identified and characterized is quite 
limited. In part, this may be due to the difficulty of isolat- 
ing such regulatory sequences. Upstream regulatory 
sequences such as transcription factor binding sites are 
typically too short to be utilized as probes for isolating 
pronDoters from human genomk; libraries. Recently, 
some approaches have been developed to Isolate hu- 
man promoters. One of them consists of making a CpG 
island library (Cross etal., , Natum GeneticsB: 236-244, 
1994). The second consists of isolating human genomic 
DNA sequences containing Spel binding sites by the 
use of Spel binding protein. (Mortkxk et al., Genome 
Res, 6:327-335, 1996). Both of these approaches have 
their limits due to a lack of specificity or because they 
are not universally applicable since only a limited 
number of promoters have either a CpQ island or a Spe 
I recognitnn site and because Spe t binding sites are 
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not specifically (ound In promoter regions. Thus, there 
exists a need to identify and systematicalty characterize 
the 5' portions of the genes. 

[001 3] The present 5' ESTs nnay be used to efficiently 
identify and isolate 5'UTRs and upstream regulatory re- 
gions which control the location, developmental stage, 
rate, and quantity of protein synthesis, as well as the 
stability of the mRNA. Once identified and character- 
ized, these regulatory regions may be utilized in gene 
therapy or protein purification schemes to obtain the de- 
sired anrxjunt and locations o( protein synthesis or to in- 
hibit, reduce, or prevent the synthesis of undesirable 
gene products. 

[001 4] In addition. ESTs containing the 5' ends of pro- 
tein genes may include sequences useful as probes for 
chromosome mapping and the identification of individ- 
uals. Thus, there is a need to identify and characterize 
the sequences upstream of the 5' coding sequences of 
genes. 

Summary of the Invention 

[0015] The present invention relates to purified, iso- 
lated, or enriched 5' ESTs which include sequences de- 
rived from the authentic 5' ends of their corresponding 
mRNAs. The term "corresponding mRNA" refers to the 
mRNA which was the template for the cDNA synthesis 
which produced the 5* EST. These sequences will be 
referred to hereinafter as "5' ESTs." The present inven- 
tion also includes purified, isolated or enriched nucleic 
acids comprising contigs assembled by determining a 
consensus sequences from a plurality of ESTs contain- 
ing overlapping sequences. These contigs will be re- 
ferred to herein as "consensus contigated ESTs." 
[0016] As used herein, the term "purified" does not re- 
quire absolute purity; rather, it is intended as a relative 
definition. Individual 5' EST clones isolated from a cDNA 
library have been conventionally purified to electro- 
phoretic homogeneity. The sequences obtained from 
these clones could not be obtained directly either from 
the library or from total human DNA. The cDNA clones 
are not naturally occurring as such, but rather are ob- 
tained via manipulation of a partially purified naturally 
occurring substance (messenger RNA). The conversion 
of mRNA into a cONA library involves the creation of a 
synthetic substance (cDNA) and pure individual cDNA 
clones can be isolated from the synthetic library by clon- 
al selection. Thus, creating a cDNA library from mes- 
senger RNA and subsequently isolating individual 
clones from that library results in an approximately 10*- 
10® fold purification of the native message. Purification 
of starting material or natural material to at least one 
order of magnitude, preferably two or three orders, and 
more preferably four or five orders of magnitude is ex- 
pressly contemplated. 

[0017] As used herein, the term "Isolated" requires 
that the material be removed from its original environ- 
ment (e.g., the natural environment if it is naturally oc- 



curring). For example, a naturally-occurring polynucle- 
otide present in a living animal is not isolated, but the 
same polynucleotide, separated from some or all of the 
coexisting materials in the natural system, is isolated. 
5 [0018] As used herein, the term "enriched" means 
that the 5' EST is adjacent to "backbone* nucleic acid 
to which it is not adjacent in its natural environment. Ad- 
ditionally, to be "enriched" the 5* ESTs will represent 5% 
or more of the number of nucleic acid inserts in a pop- 
^0 ulation of nucleic acid backbone molecules. Backbone 
molecules according to the present invention include 
nucleic acids such as expression vectors, self-repllcat- 
ing nucleic acids, viruses, integrating nucleic acids, and 
other vectors or nucleic acids used to maintain or ma- 

?5 nipulate a nucleic acid insert of interest. Preferably, the 
enriched 5' ESTs represent 15% or more of the number 
of nucleic ackJ inserts in the population of recombinant 
backbone molecules. More preferably, the enriched 5' 
ESTs represent 50% or more of the number of nucleic 

20 ackj Inserts in the populatk>n of recombinant backbone 
nnolecules. In a highly preferred embodiment, the en- 
riched 5' ESTs represent 90% or more of the number of 
nucleic acid Inserts in the population of recombinant 
backbone nrx^lecules. 

25 [0019] "Stringent", "moderate." and "low" hybrkJiza- 
tlon conditions are as defined below. 
[0020] The term "polypeptide" refers to a polymer of 
amino ackte without regard to the length of the polymer; 
thus, peptkles, oligopeptides, and proteins are included 

30 within the definition of polypeptide. This term also does 
not specify or exclude post-expression modifications of 
polypeptides, for example, polypeptides whk;h include 
the covalent attachment of glycosyl groups, acetyl 
groups, phosphate groups, lipid groups and the like are 

3S expressly encompassed by the term polypeptide. Also 
included within the definition are polypeptides which 
contain one or nr^ore anabgs of an amino acid (including, 
for example, non-natu rally occurring amino ackJs, ami- 
no acids whk:h only occur naturally In an unrelated bio- 

40 kjgical system, nrx)dlfied amino acids from mammalian 
systems etc.), polypeptides with substituted linkages, as 
well as other nrK>dlflcations known in the art, both natu- 
rally occurring and non-naturally occurring. 
[0021] As used interchangeably herein, the terms 

45 "nucleic acids", "oligonucleotides", and "polynucle- 
otides" include RNA, DNA, or RNA/DNA hybrid se- 
quences of nriore than one nucleotide in either single 
chain or duplex form. The term "nucleotide" as used 
herein as an adjective to describe molecul&s comprising 

so RNA, DNA. or RNA/DNA hybrid sequences of any 
length in single-stranded or duplex form. The term "nu- 
cleotide" Is also used herein as a noun to refer to indl- 
vkJual nucleotides or varieties of nucleotides, meaning 
a molecule, or individual unit in a larger nucleic acid mol- 

55 ecule, comprising a purine or pyrlmidlne, a ribose or de- 
oxyribose sugar moiety, and a phosphate group, or 
phosphodiester linkage in the case of nucleotides within 
an ollgonucleotkje or polynucleotide. Although the term 
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'nucleotide" is also used herein to encompass "nrKxiified 
nucleotides" which comprise at least one modifications 
(a) an alternative linking group, (b) an analogous form 
of purine, (c) an analogous form of pyrimidine, or (d) an 
analogous sugar, for examples of analogous linking 
groups, purine, pyrimidines, and sugars see for example 
PCT publication No. WO 95/04064. The polynucleotide 
sequences of the invention may be prepared by any 
known method, including synthetic, recombinant, ex vi- 
vo generation, or a combination thereof, as well as uti- 
lizing any purification methods known in the art. 
[0022] The terms "base paired" and "Watson & Crick 
base paired" are used interchangeably herein to refer 
to nucleotides whrch can be hydrogen bonded to one 
another be virtue of their sequence identities in a man- 
ner like that found in double-helk:al DNA with thymine 
or uracil residues linked to adenine residues by two hy- 
drogen bonds and cytosine and guanine residues linked 
by three hydrogen bonds (See Stryer, L. Biochemistry, 
4^** editran. 1995). 

[0023] The ternns "complementary" or "complement 
thereof* are used herein to refer to the sequences of 
polynucleotides whrch is capable of fomning Watson & 
Crick base pairing with another specified polynucleotide 
throughout the entirety of the complementary region. 
For the purpose of the present inventwn, a first polynu- 
cleotide is deemed to be complementary to a second 
polynucleotide when each base in the first polynucle- 
otide is paired with its complementary base. Comple- 
mentary bases are, generally, A and T (or A and U), or 
C and G. "Complement" is used herein as a synonym 
from "complementary polynucleotWe", "complementary 
nucleic acid" and "complementary nucleotkje se- 
quence". These terms are applied to pairs of polynucle- 
otides based solely upon their sequences and not any 
particular set of conditions under which the two polynu- 
cleotides would actually bind. Preferably, a "comple- 
mentary" sequence is a sequence which an A at each 
position where there is a T on the opposite strand, a T 
at each position where there is an A on the opposite 
strand, a G at each position where there is a C on the 
opposite strand and a C at each positk^n where there is 
a G on the opposite strand. 

[0024] Thus. 5' ESTs in cDNA libraries in which one 
or more 5' ESTs make up 5% or more of the number of 
nucleic acid inserts in the backbone rrwlecules are "en- 
riched recombinant 5' ESTs" as defined herein. Like- 
wise, 5' ESTs in a population of ptesmkJs in which one 
or more 5' ESTs of the present invention have been in- 
serted such that they represent 5% or more of the 
number of inserts in the plasmid backbone are "enriched 
recombinant 5' ESTs" as defined herein. However, 5* 
ESTs in cDNA libraries in whk;h 5* ESTs constitute less 
than 5% of the number of nucleic acid inserts in the pop- 
ulation of backbone molecules, such as libraries in 
which backbone nnolecules having a 5' EST insert are 
extremely rare, are not "enriched recombinant 5' ESTs." 
[0025] In some embodiments, the present invention 



relates to 5' ESTs which are derived from genes encod- 
ing secreted proteins. As used herein, a "secreted" pro- 
tein is one which, when expressed in a suitable host cell, 
is transported across or through a membrane, including 
s transport as a result of signal peptides in its amino acid 
sequence. "Secreted" proteins include without limitatton 
proteins secreted wholly (e.g. soluble proteins), or par- 
tially (e.g. receptors) from the cell in which they are ex- 
pressed. "Secreted" proteins also include without limi- 
^0 tation proteins which are transported across the mem- 
brane of the endoplasmic retrculum. 
[0026] Such 5' ESTs include nucleic acid sequences, 
called signal sequences, which encode signal peptides 
which direct the extracellular secretion of the proteins 
IS encoded by the genes from whrch the 5' ESTs are de- 
rived. Generally, the signal peptides are located at the 
amino termini of secreted proteins. 
[0027] Secreted proteins are translated by ribosomes 
associated with the "rough" endoplasmic retrculum. 
Generally, secreted proteins are co-translatbnaity 
transferred to the membrane of the endoplasmic reticu- 
lum. Assoctatk^ of the ribosome with the endoplasmk: 
reticulum during translatbn of secreted proteins is me- 
diated by the signal peptkje. The signal peptide is typi- 
cally cleaved following its co-translational entry into the 
endoplasmic retculum. After delivery to the endoplas- 
mic retrculum, secreted proteins may proceed through 
the Golgi apparatus. In the Golgi apparatus, the proteins 
may undergo post-translational modification before en- 
tering secretory vesicles which transport them across 
the cell membrane. 

[0028] The 5' ESTs of the present invention have sev- 
eral important applications. For example, they may be 
used to obtain and express cDNA clones whbh include 
the full protein coding sequences of the corresponding 
gene products, including the authentic translation start 
sites derived from the 5' ends of the coding sequences 
of the mRNAs from which the 5' ESTs are derived. These 
cONAs will be referred to hereinafter as "full-length cD- 
NAs. • These cDNAs way also include DNA derived from 
mRNA sequences upstream of the translation start site. 
The full-length cDNA sequences may be used to ex- 
press the proteins corresponding to the 5' ESTs. As dis- 
cussed above, secreted proteins and non-secreted pro- 
teins may be therapeutically important. Thus, the pro- 
teins expressed from the cDNAs nnay be useful in treat- 
ing or controlling a variety of human conditrans. The 5' 
ESTs may also be used to obtain the corresponding ge- 
nomic DNA. The term "corresponding genomic DNA" re- 
fers to the genomic DNA which encodes the mRNA from 
which the 5' EST was derived. 
[0029] Alternatively, the 5' ESTs may be used to ob- 
tain and express extended cDN As encoding portk»is of 
the protein. In the case of secreted proteins, the portions 
may comprise the signal peptides of the secreted pro- 
teins or the mature proteins generated when the signal 
peptide is cleaved off. 

[0030] The present invention includes isolated, puri- 
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fied» or enriched 'EST-related nucleic acids." The terms 
'isolated', 'purified' or 'enriched" have the meanings 
provided above. As used herein, the term 'EST-related 
nucleic acids' means the nucleic acids of SEQ ID NOs: 
24-4100 and B178-36681, extended cDNAs obtainable 
using the nucleic acids of SEQ ID NOs: 24-4100 and 
8178-36681 , full-length cDNAs obtainable using the nu- 
cleic acids of SEQ ID NOs: 24-41 00 and 81 78-36681 or 
genomic DNAs obtainable using the nucleic acids of 
SEQIDNOs: 24-4100 and8178-36681. The present in- 
vention also includes the sequences complementary to 
the EST-related nucleic acids. 
[0031] The present invention also includes isolated, 
purified, or enriched 'fragments of EST-related nucleic 
acids." The terms "isolated", "purified" and "enriched" 
have the meanings described above. As used herein the 
term fragments of EST-related nucleic acids" means 
fragments comprising at least 10, 12, 15. 18, 20. 23, 25, 
28, 30, 35, 40, 50, 75. 100, 200, 300. 500. or 1000 con- 
secutive nucleotides of the EST-related nucleic acids to 
the extent that fragments of these lengths are consistent 
with the lengths of the particular EST-related nucleic ac- 
ids being referred to. The present invention also in- 
cludes the sequences complementary to the fragments 
of the EST-related nucleic acids. 
[0032] The present invention also includes isolated, 
purified, or enriched "positional segments of EST-relat- 
ed nucleic acids." The terms "isolated", "purified', or 
"enriched" have the meanings provided above. As used 
herein, the term "positional segments of EST-related nu- 
cleic acids' includes segments comprising nucleotides 
1-25, 26-50. 51-75. 76-100. 101-125. 126-150, 
151-175, 176-200. 201-225, 226-250. 251-300, 
301-325, 326-350. 351-375, 376-400, 401-425, 
426-450, 451-475, 476-500. 501-525, 526-550, 
551-575, 576-600 and 601 -the terminal nucleotide of 
the EST-related nucleic acids to the extent that such nu- 
cleotide positions are consistent with the lengths of the 
particular EST-related nucleic acids being referred to. 
The term "positional segments of EST-related nucleic 
acids also includes segments comprising nucleotides 
1-50. 51-100. 101-150, 151-200. 201-250. 251-300, 
301-350, 351-400, 401-450, 450-500, 501-550, 
551-600 or 601 -the terminal nucleotide of the EST-re- 
lated nucleic acids to the extent that such nucleotide po- 
sitions are consistent with the lengths of the particular 
EST-related nucleic acids being referred to. The term 
'positional segments of EST-related nucleic acids' also 
Includes segments comprising nucleotides 1-100, 
101-200, 201-300. 301-400. 501-500, 500-600, or 
601 -the terminal nucleotide of the EST-related nucleic 
acids to the extent that such nucleotide positions are 
consistent with the lengths of the particular EST-related 
nucleic acids being referred to. In addition, the term 'po- 
sitional segments of EST-related nucleic acids* Includes 
segments comprising nucleotides 1-200, 201-400, 
400-600. or 601 -the terminal nucleotide of the EST-re- 
lated nucleic acids to the extent that such nucleotide po- 



sitions are consistent with the lengths of the particular 
EST related nucleic acids being referred to. The present 
invention also includes the sequences complementary 
to the positional segments of EST-related nucleic acids. 

5 [0033] The present invention also includes isolated, 
purified, or enriched fragments of positional segments 
of EST-related nucleic acids." The terms "isolated", "pu- 
rified', or "enriched" have the meanings provided above. 
As used herein, the term fragments of positional seg- 

10 ments of EST-related nucleic acids' refers to fragments 
comprising at least 10, 15. 18. 20, 23. 25, 28. 30. 35, 
40, 50, 75, 100. 150, or 200 consecutive nucleotides of 
the positional segments of EST-related nucleic acids. 
The present invention also includes the sequences 

'5 complementary to the fragments of positional segments 
of EST-related nucleic acids . 
[0034] The present invention also includes isolated or 
purified "EST-related polypeptides." The terms "isolat- 
ed" or "purified" have the meanings provided above. As 

20 used herein, the term "EST-related polypeptides" 
means the polypeptides encoded by the EST-related 
nucleic acids, including the polypeptides of SEQ ID 
NOs: 4101-8177. 

[0035] The present invention also includes isolated or 

25 purified 'fragments of EST-related polypeptides." The 
terms "isolated" or "purified" have the meanings provid- 
ed above. As used herein, the term fragments of EST- 
related polypeptides" means fragments comprising at 
least 5. 10. 15, 20. 25, 30, 35, 40, 50. 75. 100. or 150 

30 consecutive amino acids of an EST-related polypeptide 
to the extent that fragments of these lengths are con- 
sistent with the lengths of the particular EST-related 
polypeptides being referred to. 
[0036] The present invention also includes isolated or 

35 purified "positional segments of EST-related polypep- 
tides." As used herein, the term "positional segments of 
EST-related polypeptides' includes polypeptides com- 
prising amino acid residues 1-25. 26-50. 51-75, 76-100, 
101-125. 126-150, 151-175, 176-200. or 201 -the C-ter- 

^ minal amino acid of the EST-related polypeptides to the 
extent that such amino acid residues are consistent with 
the lengths of the particular EST-related polypeptides 
being referred to. The term "positional segments of EST- 
related polypeptides also includes segments compris- 

45 ing amino acid residues 1-50.51-100, 101-150,151-200 
or 201 -the C-terminal amino acid of the EST-related 
polypeptides to the extent that such amino acid residues 
are consistent with the lengths of the particular EST-re- 
lated polypeptides being referred to. The term "position- 

so al segments of EST-related polypeptides" also includes 
segments comprising amino acids 1-100 or 101-200 of 
the EST-related polypeptides to the extent that such 
amino acid residues are consistent with the lengths of 
particular EST-related polypeptides being referred to. In 

55 addition, the term "positional segments of EST-related 
polypeptides" includes segments comprising amino ac- 
id residues 1 -200 or 201 -the C-terminal amino acid of 
the EST-related polypeptides to the extent that amino 
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acid residues are consistent with the lengths ot the par- 
ticular EST related polypeptides being referred to. 
[0037] The present invention also includes isolated or 
purified •fragments of positional segnnents of EST-reiat- 
ed polypeptides." The terms "isolated" or 'purified" have 
the meanings provided above. As used herein, the term 
■fragments of positional segments of EST-related 
polypeptides" means fragments comprising at least 5, 
10. 15, 20. 25, 30. 35, 40, 50, 75, 100, or 150 consecu- 
tive amino acids of positional segmerits of EST-related 
polypeptides to the extent that fragments of these 
lengths are consistent with the lengths of the particular 
EST-related polypeptides being referred to. 
[0038] The present invention also includes antibodies 
which specifically recognize the EST-related polypep- 
tides, fragments of EST-reiated polypeptides, positional 
segments of EST-related polypeptides, or fragments of 
positional segments of EST-related polypeptides. In the 
case of secreted proteins, such as those of SEQ ID NOs: 
7798-7888 antibodies which specifically recognize the 
mature protein generated when the signal peptide is 
cleaved may also be obtained as described below. Sim- 
ilarly, antibodies which specifically recognize the signal 
peptides of SEQ ID NOs: 4101-4729 or 7798-7888 may 
also be obtained. 

[0039] In some embodiments and in the case of se- 
creted proteins, the EST-related nucleic acids, frag- 
ments of EST-related nucleic acids, positional segments 
of EST-related nucleic acids, or fragments of positional 
segments of nucleic acids include a signal sequence. In 
other embodiments, the EST-related nucleic acids, frag- 
ments of EST-related nucleic acids, positional segments 
of EST-related nucleic acids, or fragments of positional 
segments of nucleic acids may include the full coding 
sequence for the protein or, in the case of secreted pro- 
teins, the full coding sequence of the mature protein (i. 
e. the protein generated when the signal polypeptide is 
cleaved off). In addition, the EST-related nucleic acids, 
fragments of EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids, or fragments of po- 
sitional segments of nucleic acids may include regula- 
tory regions upstream of the translation start site or 
downstream of the stop cocton which control the 
amount, location, or devetopmental stage of gene ex- 
pression. 

[0040] As discussed above, both secreted and non- 
secreted hunnan proteins may be therapeutically impor- 
tant. Thus, the proteins expressed from the EST-related 
nucleic acids, fragments of EST-related nucleic acids, 
positional segments of EST-related nucleic acids, or 
fragments of positional segments of nucleic acids may 
be useful in treating or controlling a variety of human 
conditions. 

[0041] The EST-related nucleic acids, fragments of 
EST-related nucleic acids, positional segments of EST- 
related nucleic acids, or fragments of positional seg- 
ments of nucleic acids may be used in forensic proce- 
dures to identify individuals or in diagnostic procedures 



to identify individuals having genetic diseases resulting 
from abnormal gene expression. In addition, the EST- 
related nucleic acids, fragments of EST-related nucleic 
acids, positional segments of EST-related nucleic acids. 
5 or fragments of positional segments of nucleic acids are 
useful for constructing a high resolution map of the hu- 
man chromosomes, 

[0042] The present invention also relates to secretion 
vectors capable of directing the secretion of a protein of 

10 interest. Such vectors may be used in gene therapy 
strategies in which it is desired to produce a gene prod- 
uct in one cell wh tch is to be delivered to another location 
in the body. Secretion vectors may also facilitate the pu- 
rification of desired proteins. 

IS [0043] The present invention also relates to expres- 
sion vectors capable of directing the expression of an 
inserted gene in a desired spatial or temporal manner 
or at a desired level. Such vectors may include sequenc- 
es upstream of the EST-related nucleic acids, fragments 

20 of EST-related nucleic acids, positional segments of 
EST-related nucleic acids, or fragments of positional 
segments of nucleic acids, such as promoters or up- 
stream regulatory sequences. 
[0044] The present invention also comprises fusion 

2S vectors for making chimeric polypeptides comprising a 
first polypeptide and a second polypeptide! Such vec- 
tors are useful for determining the cellular localization 
of the chimeric polypeptides or for isolating, purifying or 
enriching the chimeric polypeptides. 

30 [0045] The EST-related nucleic acids, fragments of 
EST-related nucleic acids, positional segments of EST- 
related nucleic acids, or fragments of positional seg- 
ments of nucleic acids may also be used for gene ther- 
apy to control or treat genetic diseases. In the case of 

^ secreted proteins, signal peptides may be fused to het- 
erologous proteins to direct their extracellular secretion. 
[0046] Bacterial clones containing Bluescipt plasmids 
having inserts containing the sequence of the non -clus- 
tered 5'ESTs are presently stored at 80»C in 4% (v/v) 

40 glycerol in the inventor*s laboratories under the desig- 
nations. The non-clustered 5'ESTs are those which 
comprise a single EST from a single tissue in the listing 
of Table II. The inserts may be recovered from the stored 
nriaterials by growing the appropriate clones on a suita- 

^ ble medium. The Bluescript DNA can then be isolated 
using plasmid isolation procedures familiar to those 
skilled in the art such as alkaline lysis minipreps or large 
scale alkaline lysis plasmki isolatbn procedures. If de- 
sired the plasmid DNA may be further enriched by cen- 

50 trifugation on a cesium chloride gradient, size exclusion 
chromatography, or anion exchange chromatography. 
The plasmid DNA obtained using these procedures may 
then be manipulated using standard cloning techniques 
familiar to those skilled in the art. Alternatively, a PGR 

^ can be done with primers designed at both ends of the 
inserted EST-related nucleic acids, fragments of EST- 
related nucleic ackis. positional segments of EST-relat- 
ed nucleb ackjs, or fragments of positkxial segments of 
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nucleic acids. The PGR product which corresponds to 
the EST-related nucleic acids, fragments of EST-related 
nucleic acids, positional segments of EST-related nu- 
cleic acids, or fragments of positional segments of nu- 
cleic acids can then be manipulated using standard 
cloning techniques familiar to those skilled in the art. 
[0047] One embodiment of the present invention is a 
purified nucleic acid comprising a sequence selected 
from the group consisting of SEQ ID NOs: 24-4100 and 
SEQ ID NOs: 81 78-36681 and sequences complemen- 
tary to the sequ ences of SEQ I D NOs; 24-41 00 and SEQ 
ID NOs: 8178-36681. 

[0048] Another embodiment of the present invention 
is a purified nucleic acid comprising at least 10 consec- 
utive nucleotides of a sequence selected from the group 
consisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 and sequences complementary to the se- 
quences of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681. 

[0049] Another embodiment of the present invent ton 
is a purified nucleic acid comprising at least 15 consec- 
utive nucleotides of a sequence selected from the group 
consisting of SEQ ID NOs; 24-4100 and SEQ ID NOs: 
8178-36681 and sequences complementary to the se- 
quences of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681. 

[0050] A further embodiment of the present invention 
is a purified nucleic acid comprising the coding se- 
quence of a sequence selected from the group consist- 
ing of 24-4100. 

[0051] Yet another embodiment of the present inven- 
tion is a purified nucleic acid comprising the full coding 
sequences of a sequence selected from the group con- 
sisting of SEQ ID NOs; 3721-3811 wherein the full cod- 
ing sequence comprises the sequence encoding the 
signal peptide and the sequence encoding the nature 
protein. 

Still another embodiment of the present invention is a 
purified nucleic acid comprising a contiguous span of a 
sequence selected from the group consisting of SEQ ID 
NOs: 3721-3811 which encodes the mature protein. 
[0052] Another embodiment of the present inventton 
is a purified nucleic acid comprising a contiguous span 
of a sequence selected from the group consisting of 
SEQ ID NOs: 24-652 and 3721-381 1 which encodes the 
signal peptide. 

[0053] Another embodiment of the present inventton 
is a purified nucleic acid encoding a polypeptide com- 
prising a sequence selected from the group consisting 
of the sequences of SEQ ID NOs: 4101-8177. 
[0054] Another embodiment of the present inventton 
is a purified nucleic acid encoding a polypeptide com- 
prising a sequence selected from the group consisting 
of the sequences of SEQ ID NOs: 7798-7888. 
[0055] Another embodiment of the present invention 
is a purified nucleic acid encoding a polypeptide com- 
prising a mature protein included In a sequence selected 
from the group consisting of the sequences of SEQ ID 



NOs: 7798-7888. 

[OQSB] Another embodiment of the present invention 
is a purified nucleic acid encoding a polypeptide com- 
prising a signal peptide included in a sequence selected 
5 from the group consisting of the sequences of SEQ ID 
NOs: 4101-4729 and 7798-7888. 
[0(^7] Another embodiment of the present invention 
is a purified nucleic acid at least 15,18, 20, 23, 25, 28. 
30, 35, 40. 50, 75, 100, 200. 300, 500 or 1000 nucle- 
10 otides in length which hybridizes under stringent condi- 
tions to a sequence selected from the group consisting 
of SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 
and sequences complementary to the sequences of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681. 
^5 [0058] Another embodiment of the present invention 
is a purified or isolated polypeptide comprising a se- 
quence selected from the group consisting of the se- 
quences of SEQ ID NOs: 4101-8177. 
[0059] Another embodiment of the present invention 
is a purified or isolated polypeptide comprising a se- 
quence selected from the group consisting of SEQ ID 
NOs; 7798-7888. 

[0060] Another embodiment of the present invention 
is a purified or Isolated polypeptide comprising a mature 
protein of a polypeptide selected from the group con- 
sisting of SEQ ID NOs: 7798-7888. 
[0061] Another embodiment of the present invention 
is a purified or isolated polypeptide comprising a signal 
peptide of a sequence selected from the group consist- 
ing of the polypeptides of SEQ ID NOs: 4101-4729 and 
7798-7888. 

[0062] Another embodiment of the present Invention 
is a purified or isolated polypeptide comprising at least 
10 consecutive amino acids of a sequence selected 
from the group consisting of the sequences of SEQ ID 
NOs: 4101-8177. 

[0063] Another embodiment of the present invention 
is a method of making a cDNA comprising the steps of 
contacting a collection of mRNA molecules from human 
cells with a primer comprising at least 15 consecutive 
nucleotides of a sequence selected from the group con- 
sisting of the sequences complemiantary to SEQ ID 
NOs: 24-41 00 and SEQ ID NOs: 8178-36681. hybridiz- 
ing said primer to an mRNA in said collectton that en- 
codes said protein reverse transcribing said hybridized 
primer to make a first cDNA strand from said mRNA, 
making a second cDNA strand complementary to said 
first cDNA strand and isolating the resulting cDNA en- 
coding sakt protein comprising said first cDNA strand 
and sakJ second cDNA strand. 
[0064] Another embodiment of the present invention 
is a purified cDNA obtainable by the method of the pre- 
ceding paragraph. 

[0065] In one aspect of this embodiment, the cDNA 
encodes at least a portion of a human polypeptide. 
[0066] Another embodiment of the present invention 
Is a method of making a cDNA comprising the steps of 
obtaining a cDN A comprising a sequence selected from 
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the group consisting of SEQ ID NOs: 24-4100 and SEQ 
ID NOs: 8178-36681, contacting said cDNA with a de- 
tectable probe comprising at least 15 consecutive nu- 
cleotides of a sequence selected from the group con- 
sisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
81 78-36681 and the sequences complementary to SEQ 
ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 under 
conditions which permit said probe to hybridize to said 
cDNA, identifying a cDNA which hybridizes to said de- 
tectable probe, and isolating said cDN A which hybridiz- 
es to said probe. 

[0067] Another embodiment of the present invention 
is a purified cDNA obtainable by the method of the pre- 
ceding paragraph. 

[0068] In one aspect of this embodiment, the cDNA 
encodes at least a portion of a hunnan polypeptide. 
[0069] Another embodiment of the present invention 
Is a method of making a cDNA comprising the steps of 
contacting a collection of mRN A nnolecules from human 
cells with a first primer capable of hybridizing to the 
polyA tall of said mRNA, hybridizing said first primer to 
said polyA tail, reverse transcribing said mRNA to make 
a first cDN A strand, making a second cDNA strand com- 
plementary to said first cDNA strand using at least one 
primer comprising at least 15 consecutive nucleotides 
of a sequence selected from the group consisting of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681, 
and isolating the resulting cDNA comprising saki first 
cDNA strand and said second cDNA strand. 
[0070] Another embodiment of the present invention 
is a purified cDNA obtainable by the method of the pre- 
ceding paragraph. 

[0071] In one aspect of this embodiment, said cDNA 
encodes at least a portion of a human polypeptide. 
[0072] In another aspect of the preceding method the 
second cDNA strand is made by contacting said first cD- 
NA strand with a first pair of primers, sakj first pair of 
primers comprising a second primer comprising at least 
1 5 consecutive nucleotides of a sequence selected-from 
the group consisting of SEQ ID NOs: 24-4100 and SEQ 
ID NOs: 8178-36681 and a third primer having a se- 
quence therein which Is included within the sequence of 
said first primer, performing a first polymerase chain re- 
action with said first pair of primers to generate a first 
PGR product, contacting said first PGR product with a 
second pair of primers, said second pair of primers com- 
prising a fourth primer, said fourth primer comprising at 
least 15 consecutive nucleotides of said sequence se- 
lected from the group consisting of SEQ ID NOs: 
24-41 00 and SEQ ID NOs: 81 78-36681 , and a fifth prim- 
er, wherein said fourth and fifth hybridize to sequences 
within said first PGR product, and performing a second 
polymerase chain reactkxi. thereby generating a sec- 
ond PGR product. 

[0073] One aspect of this embodiment is a purified 
cDNA obtainable by the method of the preceding para- 
graph. 

[0074] In another aspect of this embodiment, said cD- 



NA encodes at least a portion of a human polypeptide. 
[0075] Alternatively, the second cDNA strand may be 
made by contacting said first cDN A strand with a second 
primer comprising at least 15 consecutive nucleotides 

5 of a sequence selected from the group consisting of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681. 
hybridizing said second primer to said first strand cDNA, 
and extending said hybridized second primer to gener- 
ate said second cDNA strand. 

10 [0076] One aspect of the above embodiment is a pu- 
rified cDNA obtainable by the method of the preceding 
paragraph. 

[0077] In a further aspect of this embodiment said cD- 
NA encodes at least a portion of a human polypeptide. 

15 [0078] Another embodiment of the present invention 
is a method of nrraking a polypeptide comprising the 
steps of obtaining a cDNA which encodes a polypeptide 
encoded by a nucleic acid comprising a sequence se- 
lected from the group consisting of SEQ ID NOs: 

20 24-4100 or a cDNA whrch encodes a polypeptide conrv 
prising at least 1 0 consecutive amino acids of a polypep- 
tide encoded by a sequence selected from the group 
consisting of SEQ ID NOs: 24-4100, inserting said cD- 
NA in an expression vector such that said cDNA is op- 

2S erably linked to a promoter, introducing sakj expression 
vector into a host cell whereby said host cell produces 
the protein encoded by sakJ cDNA, and isolating sakj 
protein. 

- [0079] Another aspect of this embodiment is an iso- 
30 lated protein obtainable by the method of the preceding 
paragraph. 

[0080] Another embodiment of the present invention 
is a method of obtaining a promoter DNA comprising the 
steps of obtaining genomic DNA located upstream of a 

^ nucleic ackl comprising a sequence selected from the 
group consisting of SEQ ID NOs: 24-4100 and SEQ ID 
NOs: 8178-36681 and the sequences complementary 
to the sequences of SEQ ID NOs: 24-4100 and SEQ ID 
NOs: 8178-36681, screening said genomic DNA to 

^ kJentify a pronnoter capable of directing transcript ton in- 
itiation, and 

isolating sakJ DNA comprising said identified promoter. 
[0081] In one aspect of this embodiment, said obtain- 
ing step comprises walking from genomic DNA compris- 

^ ing a sequence selected from the group consisting of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 
and the sequences complementary to SEQ ID NOs: 
24-4100 and SEQ ID NOs: 8178-36681. In another as- 
pect of this embodiment, said screening step comprises 

so inserting genomic DNA tocated upstream of a sequence 
selected from the group consisfing of SEQ ID NOs: 
24-4100 and SEQ ID NOs: 8178-36681 and the se- 
quences complementary to SEQ ID NOs: 24-4100 and 
SEQ ID NOs: 8178-36681 into a promoter reporter vec- 

^ tor. For example, saki screening step may comprise 
identifying motifs in genomic DNA located upstream of 
a sequence selected from the group consisting of SEQ 
ID NOs: 24-41 00 and SEQ ID NOs: 81 78-36681 and the 
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sequences complementary to SEQ ID NOs: 24-4100 
and SEQ ID NOs: 8178-36681 which are transcription 
factor binding sites or transcription start sites. 
[0082] Another embodiment of the present invention 
is a isolated promoter obtainable by the method of the 
paragraph above. 

Another embodiment of the present Invention is the in- 
clusion of at least one sequence selected from the group 
consisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 , the sequences complementary to the se- 
quences of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 and fragments comprising at least 15 con- 
secutive nucleotides of said sequence in an array of dis- 
crete ESTs or fragments thereof of at least 15 nucle- 
otides in length. In some aspects of this embodiment, 
the array includes at least two sequences selected from 
the group consisting of SEQ ID NOs: 24-4100 and SEQ 
ID NOs: 8178-36681 , the sequences complementary to 
the sequences of SEQ ID NOs: 24-4100 and SEQ ID 
NOs: 81 78-36681 , and fragments comprising at least 1 5 
consecutive nucleotides of said sequences. In another 
aspect of this embodiment., the array includes at least 
five sequences selected from the group consisting of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681, 
the sequences complementary to the sequences of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 
and fragments comprising at least 15 consecutive nu- 
cleotides of said sequences. 

[0083] Another embodiment of the present invention 
is an enriched population of recombinant nucleic acids, 
said recombinant nucleic acids comprising an insert nu- 
cleic acid and a backbone nucleic acid, wherein at least 
5% of said insert nucleic acids in said population com- 
prise a sequence selected from the group consisting of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 
and the sequences complementary to SEQ ID NOs: 
24-4100 and SEQ ID NOs: 8178-36681. 
[0084] Another embodiment of the present invention 
is a purified or isolated antibody capable of specifically 
binding to a polypeptide comprising a sequence select- 
ed from the group consisting of SEQ ID NOs: 
4101-8177. 

A purified or isolated antibody capable of specifically 
binding to a polypeptide comprising at least 10 consec- 
utive amino acids of a sequence selected from the group 
consisting of SEQ ID NOs: 4101-8177. 
An antibody composition capable of setectivety binding 
to an eprtope-containing fragment of a polypeptide com- 
prising a contiguous span of at least 8 amino acids of 
any of SEQ ID NOs: 4101 -8177, wherein said antibody 
is polyclonal or monoclonal. 

[0085] Another embodiment of the present invention 
is a computer readable medium having stored thereon 
a sequence selected from the group consisting of a nu- 
cleic acid code of SEQ ID NOs: 24-4100 and 
8178-36681 and a polypeptide code of SEQ ID NOs: 
4101-8177. 

[0086] Another embodiment of the present invention 



is a computer system comprising a processor and a data 
storage device wherein said data storage device has 
stored thereon a sequence selected from the group con- 
sisting of a nucleic acid code of SEQID NOs: 24-4100 

5 and 8178-36681 and a polypeptide code of SEQ ID 
NOs: 4101-8177. In one aspect of this embodiment the 
computer system further comprises a sequence com- 
parer and a data storage device having reference se- 
quences stored thereon. For example, the sequence 

^0 comparer may comprise a computer program which in- 
dicates polymorphisms. 

In another aspect of this embodiment, the computer sys- 
tem further comprises an identifier which identifies fea- 
tures in said sequence. 
75 [0087] Another embodiment of the present invention 
is a method for comparing a first sequence to a refer- 
ence sequence wheriein said first sequence is selected 
from the group consisting of a nucleic acid code of SE- 
QID NOs: 24-4100 and 8178-36681 and a polypeptide 
code of SEQ ID NOs: 4101-8177 comprising the steps 
of reading said first sequence and said reference se- 
quence through use of a computer program which com- 
pares sequences and determining differences between 
said first sequence and said reference sequence with 
said computer program. In some aspects of this embod- 
iment, said step of determining differences between the 
first sequence and the reference sequence comprises 
identifying polymorphisms. 

[0088] Another embodiment of the present invention 
is a method for identifying a feature in a sequence se- 
lected from the group consisting of a nucleic acid code 
of SEQID NOs: 24-4100 and 8178-36681 and a 
polypeptide code of SEQ ID NOs: 4101-8177 compris- 
ing the steps of reading said sequence through the use 
of a computer program which identifies features in se- 
quences and identifying features in said sequence with 
said computer program. 

[0089] Another embodiment of the present invention 
is a vector comprising a nucleic acid according to any 
one of the nucleic acids described above. 
[0090] Another embodiment of the present invention 
Is a host celt containing the above vector. 
[0091] Another embodiment of the present invention 
Is a method of nnaking any of the nucleic acids described 
above comprising the steps of introducing said nucleic 
acid into a host cell such that sakJ nucleic acid is present 
In multiple copies in each host celt and isolating said 
nucleic acid from said host cell. 
[0092] Another embodiment of the present invention 
is a method of making a nucleic acid of any of the nucleic 
ackts described above comprising the step of sequen- 
tially linking together the nucleotides in said nucleic ac- 
kis. 

[0093] Another embodiment of the present invention 
is a method of making any of the polypeptides described 
above wherein said polypeptides is 150 amino acids in 
length or less comprising the step of sequentially linking 
together the amino acids in said polypeptide. 
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[0094] Another Gmbodiment of the present invention a sequence encoding a signal peptide in these nucleic 

is a method of making any of the polypeptides described acids. The locations of the complete ORFs are listed in 

above wherein said polypeptides is 120 amino acids in the accompanying Sequence Listing, 

length or less comprising the step of sequentially linking [0103] SEQ ID NOs: 4101-4729 are "incomplete 

together the amino acids in said polypeptides. s polypeptide sequences* whrch include a signal peptide. 

Incomplete polypeptide sequences* are polypeptide se- 
Brlet Description of the Sequence Listing quences encoded by nucleic acids in which a start co- 
don has been identified but no stop codon has been 
[0095] SEQ ID NOs: 1. 3, 5. 7. 9. 11, and 13 are full- identified. These polypeptides are encoded by the nu- 
length cDNAs prepared using the methods described io cleic acids of SEQ ID NOs: 24-652. The k)catk)n of the 
^^srein. signal peptide is listed in the accompanying Sequence 
[0096] SEQ ID NOs: 2, 4. 6, 8, 10, 12. and 14 are the Listing. In addition, the von Heijne score of the signal 
polypeptides encoded by the nucleic acids of SEQ ID peptide computed as described below is listed as the 
NOs: 1 , 3, 5, 7, 9, 1 1 , and 1 3. "score" in the accompanying Sequence Listing. The se- 
[0097] SEQ ID NOs: 15, 16. 18, 19. 21 and 22 are is quence of the signal-peptide is listed as "seq" in the ac- 
primers whose use is described in the specifk:atlon. companying Sequence Listing. The "/" In the signal pep- 
[0098] SEQ ID NOs: 1 7, 20, and 23 are the sequences tide sequence indicates the location where proteolytic 
of nucleic acids containing transcription factor binding cleavage of the signal peptide occurs to generate a ma- 
sites which were obtained as described below. ture protein. 

[0099] SEQ ID NOs: 24-652 are nucleic acids having 20 [0104] SEQ ID NOs: 4730-7797 are incomplete 
an incomplete ORF which encodes a signal peptide. As polypeptide sequences in which no signal peptide has 
used herein, an "incomplete ORF" is an open reading been identified to date. However, it remains possible 
frame in which a start codon has been Identified but no that subsequent analysis will identify a signal peptide in 
stop codon has been identified. The locatk>ns of the in- these polypeptides. These polypeptWes are encoded by 
complete ORFs and sequences encoding signal pep- 2S the nucleic acids of SEQ ID NOs: 653-3720. 
tides are listed in the accompanying Sequence Listing. [0105] SEQ ID NOs: 7798-7888 are "complete 
In addition, the von Heijne score of the signal peptide polypeptide sequences" which include a signal peptide, 
computed as described betow is listed as the "score" in "Complete polypeptide sequences" are polypeptide se- 
the accompanying Sequence Listing. The sequence of quences encoded by nucleic acids in which a start co- 
the signal-peptide is listed as "seq" in the accompanying 30 don and a stop codon have been identified. These 
Sequence Listing, The T in the signal peptide sequence polypeptides are encoded by the nucleic acids of SEQ 
indicates the location where proteolytic cleavage of the ID NOs: 3721-3811. The locaton of the signal peptide 
signal peptide occurs to generate a mature protein. is listed in the accompanying Sequence Listing. In ad- 
[0100] SEQ ID NOs: 653-3720 are nucleic ackls hav- dition, the von Heijne score of the signal peptide com- 
ing an incomplete ORF in which no sequence encoding 35 puted as described below Is listed as the "score" in the 
a signal peptide has been identified to date. However, it accompanying Sequence Listing. The sequence of the 
remains possible that subsequent analysis will identify signal-peptide is listed as "seq" in the accompanying 
a sequence encoding a signal peptide in these nucleic Sequence Listing. The T in the signal peptide sequence 
acids. The locations of the incomplete ORFs are listed indicates the locatkjn where proteolytic cleavage of the 
in the accompanying Sequence Listing. 40 signal peptide occurs to generate a mature protein. 
[0101] SEQIDNOs: 3721-3811 arenucleicacidshav- [0106] SEQ ID NOs: 7889-8177 are complete 
ing a complete ORF which encodes a signal peptide. As polypeptide sequences in which no signal peptide has 
used herein, a "complete ORF" is an open reading frame been identified to date. However, it remains possible 
in whKh a start codon and a stop codon have been iden- that subsequent analysis will identify a signal peptide in 
tified. The locations of the complete ORFs and sequenc- 45 these polypeptides. These polypeptides are encoded by 
es encoding signal peptides are listed in the accompa- the nucleic acids of SEQ ID NOs.:3812-4100. 
nying Sequence Usting. In addition, the von Heijne [0107] SEQ ID NOs: 8178-36681 are nucleic ackJse- 
score of the signal peptide computed as described be- quences in which no open reading frame has been con- 
low is listed as the "score" in the accompanying Se- clusivety identified to date. However, it remains possible 
quence Listing, The sequence of the signal-peptide is so subsequent analysis will identify an open reading frame 
listed as "seq" in the accompanying Sequence Listing. in these nucleic acids. 

The "/• in the signal peptide sequence indicates the lo- [0108] In the accompanying Sequence Listing, all in- 

cation where proteolytic cleavage of the signal peptide stances of the symbol "n" in the nucleic acid sequences 

occurs to generate a n^ature protein. mean that the nucleotkJe can be adenine, guanine, cy- 

[0102] SEQ ID NOs: 3812-4100 are nucleic acids ss tosineorthymine. In some instances the polypeptide se- 

having a complete ORF in which no sequence encoding quences in the Sequence Listing contain the symbol 

a signal peptide has been identified to date. However, it "Xaa." These "Xaa" symbols indicate either (1 ) a residue 

remains possible that subsequent analysis will identify which cannot be identified because of nucleotide se- 
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quence ambiguity or (2) a stop codon in the determined 
sequence where applicants believe one should not exist 
(if the sequence were determined more accurately). In 
some instances, several possible identities of the un- 
known amino acids may be suggested by the genetic s 
code. 

Brief Description of the Drawings 

[0109] Figure 1 summarizes the computer analysis io 
procedure for obtaining consensus contigated ESTs. 
[01 10] Figure 2 is an analysis of the 43 amino terminal 
amino acids of all human SwissProt proteins to deter- 
mine the frequency of false positives and false nega- 
tives using the techniques for signal peptide identifica- is 
Won described herein. 

[01 1 1] Figure 3 illustrates methods for making extend- 
ed cDNAs. 

[0112] Figure 4 provkies a schematb description of 
the promoters isolated and the way they are assembled 20 
with the corresponding 5' tags. 
[0113] Figure 5 describes the transcriptran factor 
binding sites pYesent in each of these promoters. 

Detailed Description of the Preferred Embodiment 25 

i. General Methods for Obtaining 5' ESTs derived 
from mRNAs with intact 5' ends 

[0114] In order to obtain the 5' ESTs of the present oo 
invention, mRNAs with intact 5* ends must be obtained. 
Example 1 below describes the preparation of 5' ESTs. 

EXAMPLE 1 

35 

Preparation of mRNA 

[0115] Total human RNAs or polyA+ RNAs derived 
from 30 different tissues were respectively purchased 
from LABI MO and CLONTECH and used to generate 40 
42 cDNA libraries as described below. The purchased 
RNA had been isolated from cells or tissues using acid 
guankjium thk)cyanate-phenol-chloroform extraction 
(Chomczyniski and Sacchi, Analytical Biochemistry 
162: 1 56-1 59. 1 987). PolyA+ RNA was isolated from to- 45 
tal RNA (l-ABIMO) by two passes of oligo dT chroma- 
tography, as described by Aviv and Leder., Proc. Natl. 
Acad. Sci. USA 69:1408-1412, 1972) in order to elimi- 
nate ribosomai RNA. 

[0116] The quality and the integrity of the polyA+ so 
RNAs were checked Northern btots hybridized with a 
globin probe were used to confirm that the mRNAs were 
not degraded. Contamination of the poiyA* mRNAs by 
ribosomai sequences was checked using Northern blots 
and a probe derived from the sequence of the 28S rR- ss 
NA. Preparattons of mRNAs with less than 5% of rRNAs 
were used in library constructton. To avoid constructing 
libraries with RNAs contaminated by exogenous se- 



quences (prokaryotk: or fungal), the presence of bacte- 
rial 1 6S ribosomai sequences or of two highly expressed 
fungal mRNAs was examined using PGR. 
[0117] Following preparation of the mRNAs from var- 
\ous tissues an oligonucleotkie tag was specifically at- 
tached to the caps at the 5' ends of the mRNAs. The 
oligonucleotide tag had an EcoRI site therein to facilitate 
later ctoning procedures. Following attachment of the ol- 
igonucleotkie tag to the mRNA. the integrity of the mR- 
NA was examined by performing a Northern blot with 
200 to 500 ng of mRNA using a probe complementary 
to the oligonucleotide tag before performing the first 
strand synthesis described in Example 2. 

EXAMPLE 2 

cDNA Synthesis Using mRNA Templates IHaving Intact 
5' Ends 

. [0118] For the mRNAs joined to oligonucleotide tags, 
first strand cDNA synthesis was performed using a re- 
verse transcriptase with random nonamers as primers. 
In order to protect internal EcoRi sites in the cDNAfrom 
digestion at later steps in the procedure, methylated 
dCTP was used for first strand synthesis. After removal 
of RNA by an alkaline hydrolysis, the first strand of cDNA 
was precipitated using isopropanoi in order to eliminate 
reskjual primers. 

[0119] The second strand of the cDNA was synthe- 
sized with a Klenow fragment using a primer corre- 
sponding to the 5'end of the Itgated oligonucleotide. 
Methylated dCTP was also used for second strand syn- 
thesis in order to protect internal EcoRi sites in the cDN A 
from digestion during the cloning process. 
[0120] Following cDNA synthesis, the cDNAs were 
ckxied into pBlueScrtpt as described in Example 3 be- 
tow. 

EXAMPLE 3 

Cloning of cDNAs derived from mRNA with intact 5' ends 
into BlueScript 

[0121] Following second strand synthesis, the ends 
of the cDNA were blunted with T4 DNA polymerase (Bi- 
olabs) and the cDNA was digested with EcoRi. Since 
methylated dCTP was used during cDNA synthesis, the 
EcoRI site present in the tag was the only hemi-methyi- 
ated site, hence the only site susceptible to EcoRI di- 
geston. ThecDNA was then size fractionated using ex- 
clusion chromatography (AcA, Biosepra) and fractions 
corresponding to cDNAs of more than 150 bp were 
pooled and ethanoi precipitated. The cDNA was direc- 
tional ly cloned into the Smai and EcoRi ends of the 
phagemid pBlueScrlpt vector (Stratagene). The ligatbn 
mixture was electroporated into bacteria and propagat- 
ed under appropriate antibiotic selectbn. 
[0122] Clones containing the oligonucleotide tag at- 
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tached were then selected as described in Example 4 
below. 

EXAMPLE 4 

5 

Selection of Clones Having the Oligonucleotide Tag 
Attached Thereto 

[0123] The plasmid DNAs containing 5' EST libraries 
made as described above were purified (Qiagen). A io 
positive selection of the tagged clones was performed 
as follows. Briefly, in this selection procedure, the plas- 
mid DNA was converted to single stranded DNA using 
gene II endonuclease of the phage Fl in combination 
with an exonuclease (Chang et ai. Gene 127:95-8, ts 
1993) such as exonuclease III or T7 gene 6 exonucle- 
ase. The resulting single stranded DNA was then puri- 
fied using paramagnetic beads as described by Fry Bt 
al, Biotechniques, 1 3: 1 24-1 31 , 1 992. In this procedure, 
the single stranded DNA was hybridized with a bloti- 20 
nylated oligonucleotide having a sequence correspond- 
ing to the 3' end of the oligonucleotide tag. Clones in- 
cluding a sequence complementary to the biotinylated 
oligonucleotide were captured by incubation with 
streptavidin coated magnetic beads followed by mag- 2S 
netic selection. After capture of the positive clones, the 
plasmid DNA was released from the magnetic beads 
and converted into double stranded DNA using a DNA 
polymerase such as the Thermosequenase obtained 
from Amersham Pharmacia Biotech, The double strand- 30 
ed DNA was then elect roporated into bacteria. The per- 
centage of positive clones having the 5' tag oligonucle- 
otide was estimated to typically rank between 90 and 
98% using dot blot analysis. 

[0124] Folbwing electroporation, the libraries were 35 
ordered in 384-microtiter plates (MTP). A copy of the 
MTP was stored for future needs. Then the libraries 
were transferred into 96 MTP and sequenced as de* 
scribed below. 

40 

EXAMPLE 5 

Sequencing of Inserts in Selectied Clones 

[0125] Plasmid inserts were first amplified by PCR on 45 
PE-9600 thermocyclers (Perkin-Elmer, Applied Bkjsys- 
tems Division, Foster City. CA), using standard SETA-A 
and SETA-B primers (Genset SA), AmpliTaqGold (Per- 
kin-Elmer), dNTPs (Boehringer), buffer and cycling con- 
ditions as recommended by the Perkin-Elmer Corpora- so 
tlon. 

[0126] PCR products were then sequenced using au- 
tomatic ABI Prism 377 sequencers (Pertain Elmer). Se- 
quencing reactions were performed using PE 9600 ther- 
mocyclers with standard dye-primer chemistry and ss 
ThermoSequenase (Amersham Pharmacia Biotech). 
The primers used were either T7 or 21 Ml 3 (available 
from Genset SA) as appropriate. The primers were la- 



beled with the JOE. FAM, ROX and JAMBA dyes. The 
dNTPs and ddNTPs used in the sequencing reactions 
were purchased from Boehringer, Sequencing buffer, 
reagent concentrations and cycling conditions were as 
recommended by Amersham. 
[0127] Following the sequencing reaction, the sam- 
ples were precipitated with ethanol, resuspended in for- 
mamide loading buffer, and toaded on a standard 4% 
acrylamkJe gel. Electrophoresis was performed for 2.5 
hours at 3000V on an ABI 377 sequencer, and the se- 
quence data were collected and analyzed using the ABI 
Prism DNA Sequencing Analysis Software, version 
2.1.2. 

EXAMPLE 6 

Obtaining 5' ESTs from Full-length cDNA libraries 
Obtained from mRNA with Intact 5' Ends 

[0128] Alternatively. 5'ESTs nnay be isolated from oth- 
er cDNA or genomic DNA libraries. Such cDNA or ge- 
nomic DNA libraries may be obtained from a commercial 
source or made using other technkjues familiar to those 
skilled in the art. One example of such cDNA library con- 
structk>n. a full-length cDNA library, is as folbws. 
[0129] PolyA+ RNAs are prepared and their quality 
checked as described in Example 1. Then, the caps at 
the 5' ends of the polyA^ RNAs are specifically joined to 
an oligonucleotide tag. The oligonucleotide tag may 
contain a restrctbn site such as Eco Rl to facilitate fur- 
ther subckxiing procedures. Northern btotting is then 
performed to check the size of mRNAs having the oli- 
gonucleotide tag attached thereto and to ensure that the 
mRNAs were actually tagged. 
[0130] First strand synthesis is subsequently carried 
out for mRNAs joined to the oligonucleotide tag as de- 
scribed in Example 2 above except that the random non- 
amers are replaced by an oligo-dT primer. For instance, 
this oligoKJT primer may contain an internal tag of 4 nu- 
cleotides whrch is different from one tissue to the other. 
Folbwing second strand synthesis using a primer con- 
tained in the oligonucleotkJe tag attached to the 5* end 
of mRNA, the blunt ends of the obtained double strand- 
ed full-length DNAs are modified into cohesive ends to 
facilitate subcloning. For example, the extremities of 
full-length cDNAs may be modified to altow subcloning 
into the Eco Rl and Hind III sites of a Bluescript vector 
using the Eco Rl site of the oligonucleotide tag and the 
addition of a Hind III adaptor to the 3' end of full-length 
cDNAs. 

[0131] The full-length cDNAs are then separated into 
several fractions according to their sizes using tech- 
niques familiar to those skilled in the art. For example, 
electrophoretic separation may be applied in order to 
yield 3 or 6 different fractions. Following gel extraction 
and purificatton, the cDNA fractbns are subcloned into 
appropriate vectors, such as Bluescript vectors, trans- 
formed into competent bacteria and propagated under 



13 



25 



EP 1 033 401 A2 



26 



appropriate antibiotic conditions. Subsequently, plas- 
mids containing tagged full-length cDNAs are positively 
selected as described in Example 4. 
[0132] The 5' end of full-length cDNAs isolated from 
such cONA libraries may then be sequenced as de- 
scribed in Example 5 

11.2. Computer Analysis of the Isolated 5' ESTs: 
construction of NetGene^** and SignalTag^^ 
databases 

[01 33] The sequence data from the 42 cDNA libraries 
made as described above were transferred to a data- 
base, where quality control and validatbn steps were 
performed. A base-catler, working using a Unix system, 
automatically flagged suspect peaks, taking into ac- 
count the shape of the peaks, the inter-peak resolutbn, 
and the noise level. The proprietary base-caller also per- 
formed an automate trimming. Any stretch of 25 or few- 
er bases having more than 4 suspect peaks was con- 
sidered unreliable and was discarded. Sequences cor- 
responding to cloning vector or ligation oligonucleotides 
were autonnatically removed from the EST sequences. 
However, the resulting EST sequences may contain 1 
to 5 bases belonging to the above mentioned sequenc- 
es at their 5' end. If needed, these can easily be re- 
moved on a case to case basis. 
[0134] Following sequencing as described above, the 
sequences of the 5' ESTs were entered in NetGene™, 
a database for storage and manipulation as described 
below and as depicted in Figure 1 . Before searching the 
ESTs in the NetGene™ database for sequences of In- 
terest, ESTs derived from mRN As which were not of in- 
terest, such as endogenous or exogenous contami- 
nants, redundant sequences, small sequences, highly 
degenerate sequences, or repeated sequences were 
identified and eliminated from further consideration. 
[0135] In order to determine the accuracy of the se- 
quencing procedure as well as the efficiency of the 5* 
selection described above, the analyses described in 
Examples 7 and 8 respectively were performed on 
5'ESTs obtained from NetGene^ database following 
the elimination of sequences which were not of interest. 

EXAMPLE? 

Measurement of Sequencing Accuracy by Comparison 
to Known Sequences 

[0136] To further determine the accuracy of the se- 
quencing procedure described in Example 5, the se- 
quences of NetGene™ 5' ESTs derived from known se- 
quences were identified and compared to the original 
known sequences. First, a FASTA analysis with over- 
hangs shorter than 5 bp on both ends was conducted 
on the 5* ESTs to identify those matching an entry in the 
public human mRNA database. The 6655 5' ESTs which 
matched a known human mRNA were then realigned 



with their cognate mRNA and dynamic programming 
was used to include substitutions, insertk^ns, and dele- 
tions in the list of "errors" which would be recognized. 
Errors occurring in the last 10 bases of the 5' EST se- 
5 quences were ignored to avoid the inclusron of spurious 
cloning sites in the analysis of sequencing accuracy 
[0137] This analysis revealed that the sequences in- 
corporated in the NETGENE™ database had an accu- 
racy of more than 99.5%. 

70 

EXAMPLE 8 

Determination of Efficiency of 5' EST Selection 

'5 [0138] To determine the efficiency at whfch the above 
selection procedures isolated 5' ESTs which included 
sequences close to the 5* end of the mRNAs from which 
they derived, the sequences of the ends of the 5' ESTs 
derived from the elongation factor 1 subunit a and ferritin 
heavy chain genes were compared to the known cDNA 
sequences of these genes. Since the transcription start 
sites of both genes are well characterized, they may be 
used to determine the percentage of derived 5* ESTs 
which included the authentic transcription start sites. 
[01 39] For both genes, more than 95% of the obtained 
5' ESTs actually included sequences close to or up- 
stream of the 5* end of the corresponding mRNAs. 
[0140] To extend the analysis of the reliability of the 
procedures for isolating 5' ESTs from ESTs in the Net- 
Gene™ database, a similar analysis was conducted us- 
ing a database composed of human mRNA sequences 
extracted from GenBank database release 97 for com- 
parison. The 5* ends of more than 85% of 5* ESTs de- 
rived from mRNAs included in the GeneBank database 
were located cbse to the 5* ends of the known se- 
quence. As some of the mRNA sequences available in 
the GenBank database are deduced from genomk: se- 
quences, a 5' end matching with these sequences will 
be counted as an internal match. Thus, the method used 
here underestimates the yield of ESTs including the au- 
thentic 5' ends of their corresponding mRNAs. 

EXAMPLE 9 



[0141] Since the cDNA libraries made above include 
multiple 5' ESTs derived from the same mRNA, overlap- 
ping 5'ESTs may be assembled into continuous se- 

so quences. The following method (see Figure 1 ) describes 
how to efficiently cluster 5'ESTs in order to yield not only 
consensus 5'EST sequences for mRNAs derived from 
different genes but also consensus 5'EST sequences 
for different mRNAs. so called variants, transcribed from 

55 the same gene such as alternatively spliced mRNAs. 
This clustering was performed on a set of NetGene™ 
5'ESTs sequences folk^wing eliminatk>n of endogenous 
contaminants, elimination of uninformative sequences 
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45 Clusterino of the 5' ESTs 
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and masking of repeats. 

[0142] The whole set of sequences was first parti- 
tioned into smaller sets, so<alied clusters, containing 
sequences exhibiting perfect matches with each other 
on a given length. Such clusters contain 5'ESTs derived s 
from a small number of different genes. Some 5'EST se- 
quences were not clustered using this approach either 
because they were not homologous to any other se- 
quence or because the homology was not properly de- 
tected. To overcome this problem, sequences not clus- io 
tered, so called singletons, may be compared to the con- 
sensus conttgated ESTs obtained later on and, if nec- 
essary, included in the appropriate clusters and used to 
compute other consensus contigated ESTs. 
[0143] Thereafter, all variants of a given gene were is 
identified in each cluster as foibws. Overlapping se- 
quences inside a given cluster were figured as oriented 
graphs where each sequence was a node and each 
overlap an edge. Then, the different genes contained 
within a single graph which were represented by differ- 20 
ent connex components were identified and isolated 
from each other. Subsequently, the different variants of 
a same gene were isolated using an algorithm based on 
the detection of forks within a connex component. If de- 
sired, the consensus contigated EST sequences may 2S 
be verified by identifying clones in nucleic acki samples 
derived from biologteal tissues, such as cDNA libraries, 
which hybridize to the probes based on the sequences 
of the consensus contigated ESTs and sequencing 
them. 30 
[0144] Overlapping 5'EST sequences bekmging to 
the same variant as welt as included 5'EST sequences 
belonging to the same cluster were then contigated and 
consensus contigated 5'EST sequences were generat- 
ed for each variant. Some of the obtained consensus 3S 
contigated 5'EST sequences were Incomplete due to 
the fact that only included and overlapping 5'EST se- 
quences were considered to isolate genes and due to 
the algorithm developed to find variants. These variant 
consensus contigated 5'EST sequences were extended 40 
as foltows. Variants trariscribed from the same gene 
were compared pairwise and the 5* EST consensus se- 
quences that were incomplete either in 5' and/or in 3' 
were exterided with the appropriate sequence from the 
other variants. All. 5* EST consensus sequences even- ^ 
tually completed in 5' or 3' from each cluster were sub- 
sequently compared to the whole set of individual 5'EST 
sequences obtained for this cluster. 

EXAMPLE 10 SO 

Identification of the Most Probable Open Reading 
Frame of 5' ESTs 

[0145] Subsequently, the most probable coding open ss 
reading frame (ORF) may be determined for each con- 
sensus assembled 5'EST or 5'EST as follows. 
[0146] Each nucleic ackj sequence is first divk^ed into 



several subsequences which coding propensity is eval- 
uated using different methods known to those skilled in 
the art such as the evaluation of N-mer frequency and 
its variants (Pickett and Tung, Nucleic Acids Res;20: 
6441-50 (1992)) or the Average Mutual Infornnatran 
method (Grosse et al, International Conference on In- 
telligent Systems for Molecular Biology. Montreal. Can- 
ada. June 28-July 1 , 1 998). Each of the scores obtained 
by the techniques described above are then nornnalized 
by their distribution extremities and then fused using a 
neural network into a unique score that represents the 
coding probability of a given subsequence. 
[0147] The coding probability scores obtained for 
each subsequence, thus the probability score profiles 
obtained for each reading frame, are then linked to the 
initiatbn codons present on the sequence. For each 
open reading frame, defined as a nuclek: acid sequence 
of at least 50 nucleotides beginning with an ATG codon, 
an ORF score is determined. Basically, this score is the 
sum of the probability scores computed for each subse- 
quence corresponding to the considered ORF in the cor- 
rect reading frame corrected by a function that negative- 
ly ponderates locally high score values and positively 
ponderates sustained high score values. The chosen 
ORF is the one with the highest score. 
[0148] Two kinds of ORFs are conskJered. In some 
embodiments, 5'ESTs encoding ORFs of at least 50 
amino acids extending up to the end of the consensus 
assembled 5'EST sequences are obtained. In other em- 
bodiments. 5'ESTs encoding complete ORFs, namely 
ORFs with start and stop codons, containing at least 1 00 
amino acids are obtained. 

EXAMPLE 11 

Sequence Analysis 

[0149] Applk:atk)n of the clustering method described 
in Example 9 to a selected set of 126,735 NetGene^"^ 
5'ESTs free from endogenous contaminants and unin- 
formative sequences yiekied 9490 consensus assem- 
bled 5'EST sequences or variants for a total of 8037 
genes clustered representing 98,973 individual 5'ESTs. 
One of them which contained 21,138 sequences and 
was shown to contain chimeras thanks to comparison 
to public sequences was removed from further analysis. 
[0150] Both non clustered 5'ESTs, i.e. singletons, and 
consensus contigated 5'ESTs were then compared to 
already known sequences as follows. Those sequences 
matching human mRNA sequences were eliminated 
from further analysis. Then, following masking of re- 
peats those sequences matching sequences that have 
already been discovered by the inventors, namely se- 
quences exhibiting more than 90% homology over 
stretches longer than 40 nucleotides using BLAST2N 
with overhangs shorter than 10 nucleotides, were re- 
moved from further consideration. The final set repre- 
sents the sequences of the invention (SEQ ID NOs: 
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24-4100 and 8178-36681). /.a, 7609 consensus conti- 
gated 5'EST from 6398 clusters containing 31,267 
5'ESTs and 24, 972 singletons. 
[0151] Of the 6398 obtained clusters. 658 were shown 
to be multivartant. i.e. to contain several variants of the s 
same gene. Table I gives for each of the multivariant 
clusters named by its internal reference (first column), 
the list of the consensus sequences of all variants, each 
variant being represented by a different SEQ ID NO. 
[01 52] Subsequently, the nrx>st probable open reading io 
frame was determined, as described in Example 10. for 
all sequences of the invention. 3,697 5'ESTs (SEQ ID 
NOs;24-3720) encoding incomplete ORFs (SEQ ID 
NOs:4l0l-7797) of at least 50 amino acid long were 
found. In addition, 380 5'ESTs (SEQ ID NOs:3721 -41 00) is 
encoding complete ORFs (SEQ ID NOs:7798-8177) of 
at least 100 amino acids were found. 
[01 53] The nucleotide sequences of the SEQ ID NOs: 
24-4100 and 8178-36681 and the amino acid sequenc- 
es encoded by SEQ ID NOs: 24-4100 (I.e. amino acid 20 
sequences of SEQ ID NOs: 4101 -8177) are provided in 
the appended sequence listing. Some of the amino acid 
sequences may contain "Xaa" designators. These "Xaa* 
designators indicate either (1 ) a residue which cannot 
be identified because of nucleotide sequence ambiguity 2S 
or (2) a stop codon in the determined sequence where 
applicants believe one should not exist (if the sequence 
were determined more accurately). 
[01 54] If one of the nucleic acid sequences of SEQ ID 
NOs: 24-4100 and 8178-36681 are suspected of con- 3o 
tainlng one or more incorrect or ambiguous nucleotides, 
the ambiguities can readily be resolved by resequencing 
a fragment containing the nucleotides to be evaluated. 
If one or more incorrect or ambiguous nucleotides are 
detected, the corrected sequences should be included 3S 
in the clusters from which the sequences were isolated, 
and used to compute other consensus contigated se- 
quences on which other ORFs would be identified. Nu- 
cleic acid fragments for resolving sequencing errors or 
ambiguities may be obtained from deposited clones or 40 
can be isolated using the techniques described herein. 
Resolution of any such ambiguities or errors may be fa- 
cilitated by using primers which hybridize to sequences 
located close to the ambiguous or erroneous sequenc- 
es. For example, the primers may hybridize to sequenc- 
es within 50-75 bases of the ambiguity or error Upon 
resolution of an error or ambiguity, the corresponding 
corrections can be made in the protein sequences en- 
coded by the DN A containing the error or ambiguity. The 
amino acid sequence of the protein encoded by a par- so 
ticular clone can also be determined by expression of 
the clone in a suitable host cell, collecting the protein, 
and determining its sequence. 
[0155] In addition, if one of the sequences of SEQ ID 
NOs: 4101 -81 77 is suspected of containing an truncat- ss 
ed ORF as the result of a frameshift in the sequence, 
such frameshifting errors may be corrected by combtn- 
ing the following two approaches. The first one involves 



thorough examination of all double predictions, i.e. all 
cases where the probability scores for two ORFs located 
on different reading frames are high and close, prefer- 
ably different by less than 0.4. The fine examination of 
the region where the two possible ORFs overlap may 
help to detect the frameshift. In the second approach 
horrologies with known proteins are used to correct sus- 
pected frameshifts. 

EXAMPLE 12 

identification of Potential Signal Sequences in 5' ESTs 

[0156] The amino.acid sequences of SEQ ID NOs: 
4101 -8177 were then searched to identify potential sig- 
nal motifs using slight modifications of the procedures 
disclosed in Von Heijne, Nucleic Acids Res, 14: 
4683-4690, 1986. Those sequences encoding a 1 5 ami- 
no acid long stretch with a score of at least 3.5 in the 
Von Heijne signal peptide identification matrix were con- 
sidered to possess a signal sequence and were includ- 
ed in a database called SIGNALTAG™. 
[0157] The sequences of the 720 nucleic acid se- 
quences containing a signal sequence (SEQ ID NOs: 
24-652 and 3721 -381 1 ) and the corresponding polypep- 
tides with a potential signal peptide (SEQ ID NO: 
4101-4729 and 7798-7888) are provided in the Se- 
quence Listing appended hereto. The signal peptides of 
such polypeptides are indicated as features in the ap- 
pended Sequence Listing. It should be noted that, in ac- 
cordance with the regulations governing Sequence List- 
ings, in the appended Sequence Listing, the full protein 
(i.e. the protein containing the signal peptide and the 
mature protein) extends from an amino acid residue 
having a negative number through a positively num- 
bered C-terminal amino acid residue. Thus, the first ami- 
no acid of the mature protein resulting from cleavage of 
the signal peptide is designated as amino acid number 
1, and the first amino acid of the signal peptide is des- 
ignated with the appropriate negative number 
[0158] To confirm the accuracy of the above method 
for identifying signal sequences, the analysis of Exam- 
ple 1 3 was performed. 

EXAMPLE 13 

Confirmation of Accuracy of Identification of Potential 
Signal Sequences in 5' ESTs 

[0159] The accuracy of the above procedure for iden- 
tifying signal sequences encoding signal peptides was 
evaluated by applying the method to the 43 amino acids 
located at the N terminus of all human SwissProt pro- 
teins. The computed VDn Heijne score for each protein 
was compared with the known characterization of the 
protein as being a secreted protein or a non-secreted 
protein. In this manner, the number of non-secreted pro- 
teins having a score higher than 3.5 (false positives) and 
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the number of secreted proteins having a score lower 
than 3.5 (false negatives) could be calculated. 
[0160] Using the results of the above analysis, the 
probability that a peptide encoded by the 5' region of the 
mRNA is in fact a genuine signal peptide based on its 
Von Heijne's score was calculated based c»i either the 
assumption that 10% of human proteins are secreted or 
the assumption that 20% of human proteins are secret- 
ed. The results of this analysis are shown in Figure 2. 
[0161] Using the above method of identificatbn of se- 
cretory proteins, 5' ESTs of the following polypeptides 
known to be secreted were obtained; human glucagon, 
gamma interferon induced monokine precursor, secret- 
ed cyclophtlin-IIke protein, hunnan pleiotropin, and hu- 
man biotinldase precursor. Thus, the above method 
successfully identified those 5' ESTs which encode a 
signal peptide. 

[0162] To confirm that the signal peptkde encoded by 
the 5' ESTs or contigated consensus 5' ESTs actually 
functions as a signal peptide, the signal sequences from 
the 5' ESTs or consensus 5' ESTs may be cloned into a 
vector designed for the identificatbn of signal peptides. 
Such vectors are designed to confer the ability to grow 
in selective medium only to host cells containing a vector 
with an operably linked signal sequence. For example, 
to confirm that a 5' EST or consensus 5' EST encodes 
a genuine signal peptide, the signal sequence of the 5" 
EST or consensus 5' EST may be inserted upstream 
and in frame with a non-secreted fomn of the yeast in- 
vertase gene in signal peptide selection vectors such as 
those described in U.S. Patent No. 5,536,637. Growth 
of host cells containing signal sequence selection vec- 
tors with the correctly inserted 5' EST or consensus 5' 
EST signal sequence confirms that the 5* EST or con- 
sensus 5' ESTs encodes a genuine signal peptide. 
[0163] Alternatively, the presence of a signal peptide 
may be confirmed by cloning the extended cDNAs ob- 
tained using the ESTs or consensus 5' ESTs into expres- 
sion vectors such as pXT1 as described below, or by 
constructing promoter-signal sequence-reporter gene 
vectors which encode fusion proteins between the sig- 
nal peptide and an assayable reporter protein. After in- 
troduction of these vectors into a suitable host cell, such 
as COS cells or NIH 3T3 cells, the growth medium may 
be harvested and analyzed for the presence of the se- 
creted protein. The medium from these cells is com- 
pared to the medium from control cells containing vec- 
tors lacking the signal sequence or extended cDNA in- 
sert to identify vectors which encode a functional signal 
peptide or an authentic secreted protein. 

EXAMPLE 14 

Assessment of the noveltv rate of 5'ESTs 

[01 64] To assess the yiekJ of new sequences, the ob- 
tained 5'ESTs and consensus contigated 5'ESTs were 
compared to all known human mRNAs extracted from 



the EMBL release 57 and daily updates available at the 
time of filing. The comparison was performed using 
BLAST2N on both strands following masking of the re- 
peats. Sequences having more than 95% homology 
5 with public sequences over their whole length with at 
most 10 nucleotide overhangs on each extremity were 
consklered as previously identified. Thus, about 90% of 
5'ESTs or consensus assembled 5'ESTs were consid- 
ered unidentified. 

10 

II. 3. Evaluation of Spatial and Temporal Expreseion 
of mRNAs Corresponding to the 5'ESTs or Extended 
cDNAs 

IS [0165] Each of the SEQ- ID NOs: 24-4100 and 
8178-36661 was also categorized based on the tissue 
from which its corresponding mRNA was obtained, as 
described below in Example 1 5. 



Expression Patterns of mRNAs From Which the S'ESTs 
were obtained 

[0166] Table II shows the spatial distribution of each 
of the 5'ESTs (non-clustered ESTs) and of each consen- 
sus contigated ESTs respectively. Table II provides the 
SEQ ID NOs: of the 5' ESTs (referred to alternatively 
herein as non-clustered ESTs or singletons) and con- 
sensus contigated ESTs. Table II also lists the number 
of ESTs from each type of tissue whrch were used to 
assemble the contigated consensus ESTs. The SEQ ID 
NOs: in Table II which contain a single 5' EST from a 
single tissue are 5' ESTs. Each type of tissue listed in 
Table II is encoded by a letter The correspondence be- 
tween the letter code and the tissue type is given in Table 
III. For example, the consensus contigated EST of SEQ 
ID NO: 47 contains one 5'EST from cancerous prostate, 
two S'ESTs from lymph ganglia, and two S'ESTs from 
testes. 

[01 67] In additbn to categorizing the 5* ESTs and con- 
sensus contigated 5' ESTs with respect to their tissue of 
origin, the spatial and temporal expresskxi pattems of 
the mRNAs corresponding to the 5' ESTs and consen- 
sus contigated 5' ESTs, as well as their expression lev- 
els, may be determined as described in Example 16 be- 
tow. 

[0168] Characterization of the spatial and temporal 
expressbn pattems and expression levels of these mR- 
NAs is useful for constructing expression vectors capa- 
ble of producing a desired level of gene product in a de- 
sired spatial or temporal manner, as will be discussed 
in more detail below. 

[0169] Furthermore, 5' ESTs and consensus contigat- 
ed 5' ESTs whose corresponding mRNAs are associat- 
ed with disease states may also be identified. For ex- 
ample, a particular disease may result from the lack of 
expressksn, over expression, or under expression of a 
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mRNA corresponding to a 5' EST or consensus conti- 
gated 5* EST. By connparing mRNA expression patterns 
and quantities in samples taken from healthy individuals 
with those from individuals suffering from a particular 
disease, 5' ESTs or consensus contigated 5' ESTs re- 
sponsible for the disease may be identified. 
[0170] It will be appreciated that the results of the 
above characterization procedures for 5* ESTs and con- 
sensus contigated 5' ESTs also apply to extended cD- 
N As (obtainable as described below) which contain se- 
quences adjacent to the 5* ESTs and consensus conti- 
gated 5' ESTs. It will also be appreciated that if desired, 
characterization may be delayed until extended cDNAs 
have been obtained rather than characterizing the 5' 
ESTs or consensus contigated 5' ESTs themselves. 

EXAMPLE 16 

Evaluation of Expression Levels and Patterns of 
mRNAs Corresponding to EST-Related Nucleic Acids 

[0171] Expression levels and patterns of mRNAs cor- 
responding to EST-related nucleic acids may be ana- 
lyzed by solution hybridization with long probes as de- 
scribed in International Patent Application No. WO 
97/05277. Briefly, an EST-related nucleic acid, fragment 
of an EST related nucleic acid, positional segment of an 
EST-related nucleic acid, or fragment of a positional 
segment of an EST-related nucleic acid corresponding 
to the gene encoding the mRNA to be characterized is 
inserted at a cloning site immediately downstream of a 
bacteriophage (T3, T7 or SP6) RNA polymerase pro- 
moter to produce antisense RNA. Preferably, the EST- 
related nucleic acid, fragment of an EST related nucleic 
acid, positional segment of an EST-related nucleic acid, 
or fragment of a positional segment of an EST-related 
nucleic acid is 100 or nrx^re nucleotides in length. The 
plasmid is linearized and transcribed in the presence of 
ribonucleotides comprising nxxjified ribonucleotides (i. 
e. biotin-UTP and DIG-UTP). An excess of this doubly 
labeled RNA is hybridized in solution with mRNA isolat- 
ed from cells or tissues of interest. The hybridizations 
are performed under standard stringent conditions 
(40-50**C for 1 6 hours in an 80% formamide. 0.4 M NaCI 
buffer, pH 7-8). The unhybrldized probe is removed by 
digestion with ribonucleases specific for single-stranded 
RNA (i.e. RNases CL3. T1, Phy M, U2 or A). The pres- 
ence of the biotin-UTP riKxiification enables capture of 
the hybrid on a mfcrotitration plate coated with strepta- 
vidin. The presence of the DIG nrodlfication enables the 
hybrid to be detected and quantified by ELISA using an 
anti-DIG antibody coupled to alkaline phosphatase. 
[0172] The EST-related nucleic acid, fragment of an 
EST related nucleic acid, positional segment of an EST- 
related nucleic acid, or fragment of a positional segment 
of an EST-related nucleic acid may also be tagged with 
nucleotide sequences for the serial analysis of gene ex- 
pression (SAGE) as disclosed in UK Patent Application 



No. 2 305 241 A. In this method, cDNAs are prepared 
from a cell, tissue, organism or other source of nucleic 
acid for which gene expression patterns must be deter- 
mined. The resulting cDNAs are separated into two 
5 pools. The cDNAs in each pool are cleaved with a first 
reslrictbn endonuclease, called an anchoring enzyme, 
having a recognition site which is likely to be present at 
least once in most cDNAs. The fragments whrch contain 
the 5* or 3* most region of the cleaved cDNA are isolated 

10 by binding to a capture medium such as streptavidin 
coated beads. A first oligonucleotide linker having a first 
sequence for hybridization of an amplification primer 
and an internal restriction site for a so called tagging 
endonuclease is ligated to the digested cDNAs in the 

IS first pool. Dlgestk)n with the second endonuclease pro- 
duces short tag fragments from the cDNAs. 
[01 73] A second oligonucleotide having a second se- 
quence for hybridization of an amplification primer and 
an internal restrk:tbn site is ligated to the digested cD- 

20 NAs in the second pool. The cDN A fragments in the sec- 
ond pool are also digested with the tagging endonucle- 
ase to generate short tag fragments derived from the 
cDNAs in the second pool. The tags resulting from di- 
geston of the first and second pools with the anchoring 

2S enzyme and the tagging endonuclease are ligated to 
one another to produce so called ditags. In some em- 
bodiments, the ditags are concatamerlzed to produce 
ligatkxi products containing from 2 to 200 ditags. The 
tag sequences are then determined and compared to 

30 the sequences of the EST-related nucleic acid, fragment 
of an EST related nucleic acid, positional segment of an 
EST-related nucleic acid, or fragment of a positional 
segment of an EST-related nuciek: ackj to determine 
which 5' ESTs, contigated consensus 5' ESTs, or ex- 

35 tended cDNAs are expressed in the celt, tissue, organ- 
ism, or other source of nucleic ackis from which the tags 
were derived. In this way, the expression pattem of the 
5' ESTs, contigated consensus 5* ESTs, or extended cD- 
NAs in the cell, tissue, organism, or other source of nu- 

40 oleic acids is obtained. 

[0174] Quantitative analysis of gene expression may 
also be performed using arrays. As used herein., the 
term array means a one dimensional, two dimensional, 
or multidimenskxial arrangement of EST-related nucleic 

^ acids, fragments of EST related nucleic acids, positional 
segments EST-related nuciek: ackis, or fragments of po- 
sitional segments of EST-related nucleic acids. Prefer- 
ably, the EST-related nucleic acids, fragments of EST 
related nucleic acids, positional segments EST-related 

50 nucleic acids, or fragments of positional segments of 
EST-related nuciek: acids are at least 15 nucleotides in 
length. More preferably, the EST-related nucleic acids, 
fragments of EST related nucleic ackls, positnnal seg- 
ments EST-related nucleic acids, or fragments of posi- 
tional segments of EST-related nucleic ackis are at least 
100 nucleotkie long. More preferably, the fragments are 
nrK>re than 100 nucleotkies in length. In some embodi- 
ments, the EST-related nucleic acids, fragments of EST 
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related nucleic acids, positional segments EST-related 
nucleic acids, or fragments of positional segments of 
EST-related nucleic acids may be more than 500 nucle- 
otides long. 

[01 75] For example, quantitative analysis of gene ex- s 
pression may be performed with EST-related nucleic ac- 
ids, fragments of EST related nucleic acids, positional 
segments EST-related nucleic acids, or fragments of po- 
sitional segments of EST-related nucleic acids in a com- 
plementary DNA microarray as described by Schena et io 
al {Science 270:467-470, 1995; Proa Natl Acad. Sci 
U.S.A. 93:10614-10619, 1996). EST-related nucleic ac- 
ids, fragments of EST related nucleic acids, positional 
segments EST-related nucleic acids, or fragments of po- 
sitional segments of EST-related nucleic acids are am- ?5 
plified by PGR and arrayed from 96-well microliter plates 
onto silylated microscope slides using high-speed ro- 
botics. Printed arrays are incubated in a humid chamber 
to allow rehydration of the array elements and rinsed, 
once in 0.2% SDS for 1 min, twice in water for 1 min and 20 
once for 5 min in sodium borohydrlde solution. The ar- 
rays are submerged in water for 2 min at 95" C, trans- 
ferred into 0.2% SDS for 1 min. rinsed twice with water, 
air dried and stored in the dark at 25*C. 
[0176] Cell or tissue mRNA is isolated or commercial- zs 
ly obtained and probes are prepared by a single round 
of reverse transcription. Probes are hybridized to 1 cm^ 
microarrays under a 1 4 x 1 4 mm glass coverslip for 6-12 
hours at 60"C. Arrays are washed for 5 min at 25**C in 
low stringency wash buffer (1 x SSC/0.2% SDS), then 30 
for 10 min at room temperature in high stringency wash 
buffer (0. 1 x SSC/0.2% SDS). Arrays are scanned in 0. 1 
X SSC using a fluorescence laser scanning device fitted 
with a custom filter set. Accurate differential expression 
measurements are obtained by taking the average of" 35 
the ratios of two independent hybridizations. 
[0177] Quantitative analysis of the expression of 
genes nnay also be performed with EST-related nucleic 
acids, fragments of EST related nucleic acids, poslttonal 
segments EST-related nucleic acids, or fragments of po- ^ 
sitional segments of EST-related nucleic ackJs in com- 
plementary DNA arrays as described by Pletu etaL [Ge- 
nome Research 6:492-503, 1996). The EST-related nu- 
cleic acids, fragments of EST related nucleic acids, po- 
sitional segments EST-related nucleic acids, or frag- ^ 
ments of positk>nal segments of EST-related nucleic ac- 
ids thereof are PCR amplified and spotted on mem- 
branes. Then, mRNAs originating from various tissues 
or cells are labeled with radioactive nucleotides. After 
hybridization and washing in controlled condittons, the so 
hybridized mRNAs are detected by phpspho-imaging or 
autoradiography. Duplicate experiments are performed 
and a quantitative analysis of differentially expressed 
mRNAs is then performed. 

[0178] Altematlvely, expressk>n analysis of the EST- ss 
related nucleic acids, fragments of EST related nucleic 
acids, positional segments EST-related nucleic acids, or 
fragments of positbnal segments of EST-related nucleic 



acids can be done through high density nucleotide ar- 
rays as described by Lockhart etat. [Nature Biotechnol- 
ogy ^ 4: 1675-1680, 1996) and Sosnowsky etal. (Proc. 
Nati Acad. Sci. 94:1119-1123, 1997). Oligonucleotides 
of 15-50 nucleotides corresponding to sequences of 
EST-related nucleic acids, fragments of EST related nu- 
cleic acids, positional segments EST-related nucleic ac- 
ids, or fragments of positional segments of EST-related 
nucleic acids are synthesized directly on the chip (Lock- 
hart etai, supra) or synthesized and then addressed to 
the chip (Sosnowsky et al., supra). Preferably, the oli- 
gonucleotides are about 20 nucleotides in length. 
[0179] cDNA probes labeled with an appropriate com- 
pound, such as biotin, digoxigenin or fluorescent dye. 
are synthesized from the appropriate mRNA populatbn 
and then randomly fragmented to an average size of 50 
to 100 nucleotides. The sakS probes are then hybridized 
to the chip. After washing as described in Lockhart et al, 
supra and applcation of different electric fields 
(Sonowsky et a/, supra ), the dyes or labeling com- 
pounds are detected and quantified. Duplicate hybridi- 
zations are performed. Comparative analysis of the in- 
tensity of the signal originating from cDNA probes on 
the same target oligonucleotkie in different cDNA sam- 
ples indicates a differential expression of the mRNA cor- 
responding to the 5' EST consensus contigated 5' EST 
or extended cDNA from whwh the oligonucleotide se- 
quence has been designed. 

in. Use of 5' ESTs to Clone Extended cDNAs and to 
Clone the Corresponding Qenomic DNAe 

[0180] Once 5' ESTs or consensus contigated 5' ESTs 
which include the 5' end of the corresponding mRNAs 
have been selected using the procedures described 
above, they can be utilized to Isolate extended cDNAs 
which contain sequences adjacent to the 5' ESTs or con- 
tigated consensus 5' ESTs. The extended cDNAs may 
include the entire coding sequence of the protein encod- 
ed by the corresponding mRNA, including the authentic 
translation start site. If the extended cDNA encodes a 
secreted protein. It may contain the signal sequence, 
and the sequence encoding the mature protein renr^in- 
ing after cleavage of the signal peptide. Extended cD- 
NAs whk:h include the entire coding sequence of the 
protein encoded by the con-esponding mRNA are re- 
ferred to herein as full-length cDNAs.* Attematlvely, the 
extended cDNAs may not include the entire coding se- 
quence of the protein encoded by the corresponding 
mRNA, although they do include sequences adjacent to 
the 5'ESTs or contigated consensus 5' ESTs. In some 
embodiments in which the extended cDNAs are derived 
from an mRNA encoding a secreted protein, the extend- 
ed cDNAs nnay include only the sequence encoding the 
mature protein remaining after cleavage of the signal 
peptkle. or onty the sequence encoding the signal pep- 
tide. 

[01 81 ] Example 1 7 below describes a general method 



19 



37 



EP 1 033 401 A2 



38 



for obtaining extended cDNAs using 5' ESTs or consen- 
sus contigated 5' ESTs. Example 28 below describes 
the cbning and sequencing of several extended cONAs, 
including extended cDNAs which include the entire cod- 
ing sequence and authentic 5* end of the corresponding 
mRNA for several secreted proteins. 
[0182] The methods of Examples 17 and 18 can also 
be used to obtain extended cDNAs which encode less 
than the entire coding sequence of proteins encoded by 
the genes corresponding to the 5* ESTs or consensus 
contigated ESTs. In some embodiments, the extended 
cDNAs isolated using these methods encode at least 
5,10, 15. 20. 25. 30, 35. 40, 50. 75, 100. or 150 consec- 
utive amino acids of one of the proteins encoded by the 
sequences of SEQ ID NOs: 24-4100 and 8178-3668V 
In some embodiments, the extended cONAs isolated 
using these methods encode at least 5, 10. 15, 20, 25, 
30. 35, 40, 50, 75. 100. or 150 consecutive amino acids 
of one of the proteins encoded by the sequences of SEQ 
ID NOs: 24-4100. 

EXAMPLE 17 

General Method for Using 5' ESTs to Clone and 
Sequence Extended cDNAs which Include the Entire 
Coding Region and the Authentic 5'End of the 
Corresponding mRNA 

[0183] The following general method has been used 
to quickly and efficiently isolate extended cDN As includ- 
ing sequence adjacent to the sequences of the 5* ESTs 
used to obtain them. This method nnay be applied to ob- 
tain extended cDNAs for any 5' EST or consensus con- 
tigated 5' EST ofjthe 'nX®Pl<^^in?lu5l''?9those 5* ESTs 
and cornsensus conti^ted 5' ESts'^nccK^ 
proteins. This method is summarized in Figure 3. 

1. Obtaining Extended cDNAs 

a) First strand synthosis 

[0184] The method takes advantage of the known 5' 
sequence of the mRNA. A reverse transcript »n reaction 
is conducted on purified mRNA with a poly dT primer 
containing a nucleotide sequence at its 5' end allowing 
the addition of a known sequence at the end of the cDN A 
whrch corresponds to the 3' end of the mRNA. Such a 
primer and a commercially-available reverse tran- 
scriptase enzyme are added to a buffered mRNA sam- 
ple yielding a reverse transcript anchored at the 3' polyA 
site of the RNAs. Nucleotide monomers are then added 
to complete the first strand synthesis. 
[0185] After removal of the mRNA hybrWIzed to the 
first cDNA strand by alkaline hydrolysis, the products of 
the alkaline hydrolysis and the residual poly dT primer 
can be eliminated with an excluston column. 



b) Second strand synthesis 

[0186] A pair of nested primers on each end is de- 
signed based on the known 5* sequence from the 5' EST 

5 or contigated consensus 5' EST and the known 3' end 
added by the poly dT primer used in the first strand syn- 
thesis. Software used to design primers are either based 
on GC content and melting temperatures of oligonucle- 
otides, such as OSP (lllier and Green, PCR Meth. Appt. 

10 1:1 24-1 28. 1 991 ), or based on the octamer frequency 
disparity method (Griffais etat., Nucieic Acids Res. 19: 
3887-3891, 1991 such as PC-Rare (http://bioinformat- 
ics.weizmann.ac.il/software/PC-Rare/doc/manuel. 
html). 

IS [01 87] Preferably, the nested primers at the 5' end and 
the nested primers at the 3' end are separated from one 
another by four to nine bases. These primer sequences 
may be selected to have melting temperatures and spe- 
cificities suitable for use in PCR. 

20 [0188] A first PCR run is performed using the outer 
primer from each of the nested pairs. A second PCR run 
is performed using the same enzyme and the inner prim- 
er from each of the nested pairs is then performed on a 
small sample of the first PCR product. Thereafter, the 

2S primers and remaining nucleotide monomers are re- 
moved. 

2. Sequencing of Full Length Extended cDNAs or 
Fragments Thereof 

30 

[0189] Due to the lack of posrtbn constraints on the 
design of 5' nested primers compatible for PCR use us- 
ing the OSP software, amplicons of two types are ob- 
^ tajned^ P[^^^tBbty,^Q QQcor\6 5' primer Is located up- 

3S 'stream of'theTranslationinltlattoncodon thus yieW 
nested PCR product containing the entire coding se- 
quence. Such a full length extended cDNA may be used 
in a direct cloning procedure. However, in some cases, 
the second 5' primer is kx:ated downstream of the trans- 

40 lation initiation codon, thereby yielding a PCR product 
containing only part of the ORF Such incomplete PCR 
products are submitted to a modified procedure de- 
scribed in sectkxi b below. 

45 a) Nested PCR products containing complete ORFs 

[0190] When the resulting nested PCR product con- 
tains the compiete coding sequence, as predicted from 
the 5'EST or consensus contigated 5' EST sequence, it 
50 is cloned in an appropriate vector 

b) Nested PCR products containing incomplete ORFs 

[0191] When the amplicon does not contain the com- 
55 piete coding sequence, intermediate steps are neces- 
sary to obtain both the complete coding sequence and 
a PCR product containing the full coding sequence. The 
complete coding sequence can be assembled from sev- 
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eral partial sequences determined directly from different 
PGR products. 

[01 92] Once the full coding sequence has been com- 
pletely determined, new primers compatible for PGR 
use are then designed to obtain amplicons containing 
the whole coding region. However, in such cases, 3' 
primers compatible for PGR use are located inside the 
3' UTR of the corresponding mRNA, thus yielding am- 
plicons which lack part of this region, i.e. the polyA tract 
and sometimes the polyadenylation signal, as illustrated 
in Figure 3. Such full length extended cDNAs are then 
cloned into an appropriate vector 

c) Soquenctng extended cDNAs 

[0193] Sequencing of extended cDNAs can be per- 
formed using a Die Terminator approach with the Ampl- 
iTaq DNA polymerase FS kit available from Perkin Elm- 
er. 

[0194] In order to sequence PGR fragments, primer 
walking is performed using software such as OSP to 
choose primers and automated computer software such 
as ASMG (Sutton et ai. Genome Science Techno!. 1: 
9-19, 1995) to construct contigs of walking sequences 
including the initial 5' tag using minimum overlaps of 32 
nucleotkJes. Preferably, primer walking is performed un- 
til the sequences of full length cDNAs are obtained. 

3. Gloninq of Full Length Extended cDNAs 

[01 95] The PG R product containing the full coding se- 
quence is then cloned in an appropriate vector. For ex- 
ample, the extended cDNAs can be cloned into any ex- 

[01 96]''*slnce1^TPCR p^^ 

above are blunt ended molecules that can be cloned in 
either direction, the orientation of several clones for 
each PGR product is determined. Then, 4 to 10 clones 
are ordered in microtiter plates and subjected to a PGR 
reactton using a first primer kxated in the vector close 
to the ctoning site and a second primer located in the 
portbn of the extended cDNA corresponding to the 3' 
end of the mRNA. This second primer may be the anti- 
sense primer used in anchored PGR in the case of direct 
cloning (case a) bir the antisense primer kx:ated inside 
the 3'UTR in the case of indirect ctoning (case b). Glones 
in which the start codon of the extended cDNA is oper- 
ably linked to the promoter in the vector so as to permit 
expression of the protein encoded by the extended cD- 
NA are conserved and sequenced. In addition to the 
ends of cDNA inserts, approximately 50 bp of vector 
DNA on each side of the cDNA insert are also se- 
quenced. 

[0197] Gloned PGR products are then entirely se- 
quenced in order to obtain at least two sequences per 
clone. Preferably, the sequences are obtained from both 
sense and antisense strands according to the afore- 
menttoned procedure with the following modifications. 



First, both 5' and 3* ends of cloned PGR products are 
sequenced in order to confirm the identity of the clone. 
Second, primer walking is performed if the full coding 
coding region has not been obtained yet. Gontigation is 

5 then performed using primer walking sequences for 
ctoned products as well as walking sequences that have 
already contigated for uncloned PGR products. The se- 
quence is considered complete when the resulting con- 
tigs include the whole coding region as well as overlap- 

10 ping sequences with vector DNA on both ends. All the 
contigated sequences for each cloned amplrcon are 
then used to obtain a consensus sequence. 

4. Selection of cloned full length sequences obtained 
'5 from the 5' ESTs of the present invention 

[0198] A negative selection may be performed in or- 
der to eliminate unwanted cloned sequences resulting 
from either contaminants or PGR artifacts as follows. 

20 Sequences matching contaminant sequences such as 
vector DNA, tRNA, mtRNA, rRNA sequences are dis- 
carded as well as those encoding ORF sequences ex- 
hibiting extensive homotogy to repeats. Sequences ob- 
tained by direct cloning using nested primers on 5' and 

25 3' tags (section 1 . case a) but lacking polyA tail may be 
discarded. Only ORFs containing a signal peptide and 
ending either before the polyA tail (case a) or before the 
end of the cloned 3*UTR (case b) may be selected. 
Then, ORFs containing unlikely mature proteins such 

30 as mature proteins whteh size is less than 20 amino ac- 
kJs or less than 25% of the immature protein size may 
be eliminated. 

[0199] Then, for each remaining full length extended 
cDNA conteinings^^ ORFs, a preselection of ORFs 
35 may be performed using the foHowingcriteria7The ion'g- ' 
est ORF with a signal peptkJe is preferred. If the ORF 
sizes are similar, the chosen ORF is the one which sig- 
nal peptide has the highest score according to Von He- 
ijne method 

40 [0200] Sequences of full length extended cDNA 
clones may then be compared pairwise with BLAST af- 
ter masking of the repeat sequences. Sequences con- 
taining at least 90% homoksgy over 30 nucleotides may 
be clustered in the same class. Each cluster may then 

4S be subjected to a cluster analysis that detects sequenc- 
es resulting from internal priming or from alternative 
splicing, identical sequences or sequences with several 
frameshifts. This automatic analysis serves as a basis 
for manual selection of the sequences. 

so [0201] Manual selectton can be carried out using au- 
tomatically generated reports for each sequenced full 
length extended cDNA clone. During this manual proce- 
dure, a selection is operated between ctones belonging 
to the same class as foltows. 

55 [0202] Selection of full length extended cDNA clones 
encoding sequences of interest is performed using the 
following criteria. Structural parameters (initial tag. poly- 
adenylation site and signal) may be checked. Then, ho- 
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niologies with known nucleic acids and proteins may be 
examined in order to determine whether the clone se- 
quence match a known nucleic acid/protein sequence 
and, in the latter case, its covering rate and the date at 
which the sequence became public. Sequences result- 
ing from chimera or double inserts or located on chro- 
mosome breaking points as assessed by homology to 
other sequences may be discarded during this proce- 
dure as weiL 

[0203] Extended cDNAs prepared as described 
above may be subsequently engineered to obtain nu- 
cleic acids which include desired portions of the extend- 
ed cDNA using conventional techniques such as sub- 
cloning, PGR, or in vrfro oligonucleotide synthesis. For 
example, if the extended cDNA is derived from a gene 
encoding a secreted polypeptkie, it may include the full 
coding sequences (I.e. the sequences encoding the sig- 
nal peptkie and the mature protein remaining after the 
signal peptkJe is cleaved off), the sequences encoding 
the mature polypeptide (i.e. the polypeptide generated 
after the signal peptide is cleaved off), or on ly the coding 
sequences for the signal peptides. 
[0204] Similarly, nucleic acids containing any other 
desired portbn of the coding sequences for the encoded 
protein may be obtained. For example, the nucleic acid 
may contain at least 10, 12, 15, 18. 20, 23, 25, 28, 30, 
35. 40. 50. 75, 100, 200, 300, 500. or 1 000 consecutive 
bases of an extended cDNA. 

[0205] Once an extended cDNA has been obtained, 
it can be sequenced to determine the amino acid se- 
quence it encodes. Once the encoded amino acid se- 
quence has been determined, one can create and iden- 
tify any of the many conceivable cDNAs that will encode 
J[hat P'P^sin by simply usi£^ degeneracy of the ge- 
netic cbdV.Tof exifipleTSlSic va1rV^ oVorher homol-" 
ogous nucleic acids can be identified as described be- 
low. Alternatively, nucleic ackds encoding the desired 
amino acid sequence can be synthesized in vitro. 
[0206] In a preferred embodiment, the coding se- 
quence may be selected using the known codon or co- 
don pair preferences for the host organism in which the 
cDNA is to be expressed. 

[0207] In addition to PGR based methods for obtain- 
ing cDN As which include the authentic 5'end of the cor- 
responding mRNAas well as the full protein coding se- 
quence of the corresponding mRNA, traditk)nal hybrkJ- 
ization based methods may also be emptoyed. These 
methods may also be used to obtain the genomic DNAs 
which encode the mRNAs from which the 5' ESTs or 
contigated consensus 5* ESTs were derived, mRNAs 
corresponding to the extended cDNAs. or nucleic acids 
which are homologous to extended cDNAs. 5* ESTs, or 
contigated consensus 5* ESTs. Example 18 below pro- 
vides examples of such methods. 
[0208] Each identified ORF may be scanned for the 
presence of a signal peptide in the first 50 amino-acids 
or. where appropriate, within shorter regbns down to 20 
amino ackJs or less in the ORF. using the matrix method 



of von Heijne (Nuc. Acids Res. 14: 4683-4690 (1986)) 
and the nrKxiification described in Example 12. 

d) Homology to either nucleotide or protein sequences 

s 

[0209] Sequences of full-length extended cDNAs are 
then compared to known nucleotide sequences. 
Polypeptides encoded by full-length extended cDNAs 
are then compared to known polypeptide sequences. 

10 [0210] Sequences of full-length extended cDNAs are 
compared to known nuciek: acid sequences such as the 
vertebrate and EST sequences of Genbank, EMBL da- 
tabases and Genseq (DenA/ent's database of patented 
nucleotide sequences). Full-length cDNA sequences 
are also compared to the sequences of a private data- 
base (Genset internal sequences) in order to find se- 
quences that have already been identified by applicants. 
Sequences of full-length extended cDNAs with more 
than 90% homology over 30 nucleotides using either 

20 BLASTN or BLAST2N are identified as sequences that 
have already been described. Matching vertebrate se- 
quences are subsequently examined using FASTA; full- 
length extended cDNAs with more than 70% homoksgy 
over 30 nucleotkJes are identified as sequences that 

2S have already been described. 

[0211] ORFs encoded by full-length extended cDNAs 
as defined in sectk)n c) are subsequently compared to 
known amino acid sequences found in public databases 
such as Swissprot, PIR and Genptept (Denwent's data- 

^ base of patented protein sequences). These analyses 
were performed using BLASTP with the parameter W=8 
and allowing a maximum of 10 matches. Sequences of 
full-length extended cDNAs showing extensive homol- 
.*^?3yJ°.*^?^'I!.Pi?l?l" 5?9"^^ ''©cognized as al- 

[0212] In additton, the three-frame conceptual trans- 
lation products of the top strand of full-length extended 
cDNAs are compared to publicly known amino acki se- 
quences of Swissprot using BLASTX with the parameter 
<o E=0.001. Sequences of full-length extended cDNAs 
with more than 70% homology over 30 amino acid 
stretches are detected as already identified proteins. 



5. Selectk>n of cloned full-length sequences obtained 
^ from the 5' ESTs of the present invention 



(021 3] Gloned full-length extended cDN A sequences 
that have already been characterized by the aforemen- 
tioned computer analysis are then submitted to an au- 
so tomatic procedure in order ti3 preselect full-length ex- 
tended cONAs containing sequences of interest. 

a) Automatic sequence preselection 

[0214] All complete cloned full-length extended cD- 
NAs clipped for vector on both ends are considered. 
First, a negative selection is operated in order to elimi- 
nate unwanted cloned sequences resulting from either 
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contaminants or PGR artifacts as follows. Sequences 
matching contaminant sequences such as vector DNA; 
tRNA, mtRNA, rRNA sequences are discarded as well 
as those encoding ORF sequences exhibiting extensive 
homology to repeats as defined in section 4 a). Se- 
quences obtained by direct cloning using nested prim- 
ers on 5' and 3' tags (section 1 . case a) but lacking polyA 
tail are discarded. Only ORFs containing a signal pep- 
tide and ending either before the polyA tail (case a) or 
before the end of the cloned 3'UTR (case b) are kept. 
Then, ORFs containing unlikely mature proteins such 
as mature proteins which size is less than 20 amino ac- 
ids or less than 25% of the immature protein size are 
eliminated. 

[0215] Then, for each remaining full-length extended 
cDNA containing several ORFs. a preselection of ORFs 
is performed using the following criteria. The longest 
ORF with a signal peptide is preferred. If the ORF sizes 
are similar, the chosen ORF is the one which signal pep- 
tide has the highest score according to Von Heijne meth- 
od 

[0216] Sequences of full-length extended cDNA 
clones are then compared pairwise with BLAST after 
masking of the repeat sequences. Sequences contain- 
ing at least 90% homology over 30 nucleotkJes are clus- 
tered in the same class. Each cluster is then subjected 
to a cluster analysis that detects sequences resulting 
from internal priming or from alternative splicing, identi- 
cal sequences or sequences with several frameshifts. 
This automatic analysis serves as a basis for nnanual 
selection of the sequences. 

b) Manual sequence selection 

[62 1 7j iUan ual select ion can be carried out using au- 
tomaticatly generated reports for each sequenced full- 
length extended cDNA ck^ne. During this manual proce- 
dure, a selection is operated between clones belonging 
to the same class as follows. ORF sequences encoded 
by clones belonging to the same class are aligned and 
compared. If the homology between nucleotide se- 
quences of clones belonging to the same class is more 
than 90% over 30 nucleotkie stretches or if the homol- 
ogy between amino acid sequences of ckjnes belonging 
to the same class is more than 80% over 20 amino acid 
stretches, than the clones are considered as being iden- 
tical. The chosen ORF is either the one exhibiting 
matches with known amino acid sequences or the best 
one according to the criteria mentioned in the automatic 
sequence preselection section. If the nucleotide and 
amino ackJ homologies are less than 90% and 80% re- 
spectively, the ctones are said to encode distinct pro- 
teins which can be both selected if they contain se- 
quences of interest. 

[0218] Selection of full-length extended cDNA clones 
encoding sequences of interest is performed using the 
following criteria. Stnjctural parameters (initial tag, poly- 
adenylatron site and signal) are first checked. Then, ho- 



mologies with known nucleic acids and proteins are ex- 
amined in order to determine whether the clone se- 
quence match a known nucleotide/protein sequence 
. and, in the latter case, its covering rate and the date at 

5 which the sequence became publk:. If there is no exten- 
sive match with sequences other than ESTs or genomic 
DNA, or if the clone sequence brings substantial new 
information, such as encoding a protein resulting from 
alternative splrcing of an mRNA coding for an already 

'0 known protein, the sequence is kept. Examples of such 
cloned full-length extended.cDNAs containing sequenc- 
es of interest are described in Example 18. Sequences 
resulting from chimera or double inserts or kx^ated on 
chronnosome breaking points as assessed by homology 

^5 to Other sequences are discarded during this procedure. 

[0219] Extended cDNAs prepared as described 
above may be subsequently engineered to obtain nu- 
cleic acids whch include desired portions of the extend- 

20 ed cDNA using conventbnal techniques such as sub- 
ckDning, PGR, or in vitro oligonucleotide synthesis. For 
example, nucleic acids whch include only the full coding 
sequences (i.e. the sequences encoding the signal pep- 
tide and the mature protein remaining after the signal 

25 peptWe is cleaved off) may be obtained using tech- 
nk:tues known to those skilled in the art. Alternatively, 
conventbnal techniques may be applied to obtain nu- 
cleic ackJs whbh contain only the coding sequences for 
the mature protein remaining after the signal peptkJe is 

30 cleaved off or nucleic acids which contain only the cod- 
ing sequences for the signal peptides. 
[0220] Similarly, nuclec acids containing any other 
desired portion of the coding sequences for the encoded 
protein nnay be obtained. For example, the nucleic acid 

35'' rnay contain atTeast '10.15. 18, 20, 25, 28; 30. 35, 40! 
50, 75. 1 00. 1 50, 200, 300, 400 or 500 consecutive bas- 
es of an extended cDNA. 

[0221] Once an extended cDNA has been obtained, 
it can be sequenced to determine the amino acid se- 
quence it encodes. Once the encoded amino acid se- 
quence has been determined, one can create and ben- 
tify any of the many conceivable cONAs that will encode 
that protein by simply using the degeneracy of the ge* 
netb code. For example, allelic variants or other homol- 

4S ogous nucleic acids can be identified as described be- 
bw. Alternatively, nucleic acids encoding the desired 
amino acid sequence can be synthesized in vitro. 
[0222] In a preferred embodiment, the coding se- 
quence may be selected using the known codon or co- 

5^ don pair preferences for the host organism in which the 
cDNA is to be expressed. 

[0223] In addition to PGR based methods for obtain- 
ing cDNAs whteh include the authentic 5'end of the cor- 
responding mRNA as well as the complete protein cod- 
55 ing sequence of the corresponding mRNA. traditional 
hybrbization based methods may also be employed. 
These methods may also be used to obtain the genomic 
DNAs which encode the mRNAs from which the 5* ESTs 
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or consensus conttgated 5" ESTS were derived. mRNAs 
corresponding to the extended cDNAs, or nucleic acids 
which are homologous to extended cDNAs, 5* ESTs. or 
consensus contigated 5' ESTs. Example 18 below pro- 
vides examples of such methods. 

EXAMPLE 18 

Methods for Obtaining Extended cDNAs which Include 
the Entire Codino Region and the Authentic 5'End of the 
Corresponding mRNA or Nucleic Acids Homolooous to 
Extended cDNAs. 5' ESTs or Consensus Contioated 5' 
ESTs 

[0224] A full-length cDNA library can be made using 
the strategies described in Examples 1-4 above by re- 
placing the random nonamer used in Example 2 with an 
oligo-dT primer. Alternatively, a cDNA library or genomk; 
DNA library may be obtained from a commercial source 
or made using techniques familiar to those skilled in the 
art. 

[0225] Such cDNA or genomic DNA libraries may be 
used to isolate extended cDNAs obtained from 5* ESTs 
or consensus contigated 5' ESTs or nucleic acids ho- 
motogous to extended cDNAs, 5' ESTs, or consensus 
contigated 5* ESTs as follows. The cDNA library or ge- 
nomic DNA library is hybridized to a detectable probe. 
The detectable probe may comprise at least 10, 15, 18, 
20. 25, 28. 30. 35. 40, 50. 75. 100. 160. 200, 300. 400 
or 500 consecutive nucleotides of the 5' EST, consensus 
contigated 5' EST, or extended cDNA. 
[0226] Techniques for identifying cDNA clones in a 
cDNA library which hybridize to a given probe sequence 
are disclosed in Sambrook et ai, Moiecutar Cloning: A_ 
Laboratory ManuaT2d id ^ 

tory Press, 1 989. The same techniques may be used to 
isolate genomic DNAs. 

[0227] Briefly, cDNA or genomic DNA clones which 
hybridize to the detectable probe are identified and iso- 
lated for further nnanipulation as folkjws. The detectable 
probe described in the preceding paragraph is labeled 
with a detectable label such as a radioisotope or a fluo- 
rescent molecule. Techniques for labeling the probe are 
well known and include phosphorylation with polynucle- 
otide kinase, nick translation, in wYro transcription, and 
non radioactive techniques. The cDNAs or genomic 
DNAs in the library are transferred to a nitrocellulose or 
nylon filter and denatured. After blocking of non specific 
sites, the filter is incubated with the labeled probe for an 
amount of time sufficient to allow binding of the probe 
to cDNAs or genomic DNAs containing a sequence ca- 
pable of hybridizing thereto. 

[0228] By varying the stringency of the hybridization 
conditions used to identify cDNAs or genomic DNAs 
which hybridize to the detectable probe. cDNAs or ge- 
nomic DNAs having different levels of homokjgy to the 
probe can be identified and isolated as described below. 



1. Identification of cDNA or Genomic DNA Sequences 
Having a High Degree of Homokxav to the Labeled 
Probe 

s [0229] To identify cDNAs or genomic DNAs having a 
high degree of homology to the probe sequence, the 
melting temperature of the probe may be calculated us- 
ing the following formulas: 

[0230] For probes between 14 and 70 nucleotides in 
10 length the melting temperature (Tm) is calculated using 

the formula: Tm=:8 1.5+ 16.6 (log [Na+J)+0. 41 (fraction 

G+C)-(600/N) where N is the length of the probe. 

[0231] If the hybridization is carried out in a solution 

containing formamide, the melting temperature may be 
IS cateulated using the equation Tm=81 .5+16. 6( tog [Na+]) 

40.41 (fraction G+C)-(0.63%forrT>amide)-(600/N) where 

N is the length of the probe. 

[0232] Prehybridization may be carried out in 6X SSC, 
5X Denhardt's reagent, 0.5% SDS, 100 \ig denatured 
20 fragmented salmon sperm DNA or 6X SSC. 5X Den- 
hardt's reagent, 0.5% SDS, 100 jig denatured fragment- 
ed salnrai sperm DNA, 50% formamide. The formulas 
for SSC and Denhardt's solutions are listed in Sambrook 
et aA, supra. 

25 [0233] HybrkJization is conducted by adding the de- 
tectable probe to the prehybridization solutions listed 
above. Where the probe comprises double stranded 
DNA, it is denatured before additton to the hybridizatton 
solution. The filter is contacted with the hybridization so- 

30 lution for a sufficient period of time to altow the probe to 
hybridize to extended cDN As or genomic DNAs contain- 
ing sequences complementary thereto or homotogous 
thereto. For probes over 200 nucleotkJes in length, the 
hybridization rnay be carried out at 15-25'C below the 

35 "Tm. For shorter probes, such as oligonucleotide probes, 
the hybridization may be conducted at 15-25'*C below 
the Tm. Preferably, for hybridizations in 6X SSC, the hy- 
brtoization is conducted at approximately 68*C. Prefer- 
ably, for hybridizations in 50% formamkJe containing so- 

40 lutions. the hybridization is conducted at approximately 
42*»C. 

[0234] All of the foregoing hybridizations wouW be 

considered to be under "stringent' conditions. 

[0235] Following hybridizatton. the filter is washed in 

45 2X SSC, 0. 1 % SDS at room temperature for 1 5 minutes. 
The filter is then washed with 0.1X SSC, 0.5% SDS at 
room temperature for 30 minutes to 1 hour. Thereafter, 
the solution is washed at the hybridizatton temperature 
in 0.1 X SSC, 0.5% SDS. A final wash is conducted in 

50 0. 1 X SSC at room temperature. 

[0236] cDNAs or genomic DNAs which have hybrid- 
ized to the probe are identified by autoradiography or 
other conventional techniques. 

55 2. Obtaining cDNA or Genomk: DNA Sequences Having 
Lower Degrees of Homology to the Labeled Probe 

[0237] The above procedure may be modified to iden- 
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tity cDNAs or genomic DNAs having decreasing levels 
of homology to the probe sequence. For example, to ob- 
tain cDNAs or genomic DNAs of decreasing homology 
to the detectable probe, less stringent conditions may 
be used. For example, the hybridization temperature 
may be decreased in increments of 5**C from 68'C to 
42*0 in a hybridizatbn buffer having a sodium concen- 
tration of approximately 1 M. Following hybridization, the 
filter may be washed with 2X SSC, 0. 5% SDS at the tem- 
perature of hybridization. These conditions are consid- 
ered to be 'moderate' conditions above 50" C and "low" 
conditions below SO'C. 

[0238] Alternatively, the hybridization may be carried 
out in buffers, such as 6X SSC, containing formamide 
at a temperature of 42'C. In this case, the concentration 
of formamide in the hybridizatbn buffer may be reduced 
in 5% increments from 50% to 0% to identify clones hav- 
ing decreasing levels of honrK)logy to the probe. Follow- 
ing hybridization, the filter may be washed with 6X SSC, 
0.5% SDS at SO^'C. These conditions are considered to 
be "moderate" conditions above 25% formamide and 
"low" conditions below 25% formamide. 
[0239] cDNAs or genomic DNAs which have hybrid- 
ized to the probe are identified by autoradiography. 

3. Determination of the Degree of HonrK>IOQv between 
the Obtained cDN As or Genomic DNAs and 5'ESTs. 
Consensus Contigated 5'ESTs. or Extended cDNAs or 
Between the Polypeptides Encoded by the Obtained 
cDNAs or Genomic DNAs and the Polypeptides 
Encoded by the 5'ESTs. Consensus Contigated 5'ESTs. 
or Extended cDNAs 

[0240] Tojietermine the level of homology between 
the hybridized cDN A or genomic DNA and the 5'^ST, 
consensus contigated 5'EST or extended cDNA from 
which the probe was derived, the nucleotide sequences 
of the hybridized nucleic acid and the 5*EST consensus 
contigated 5'EST or extended cDNA from which the 
probe was derived are compared. The sequences of the 
5'EST, consensus contigated 5'EST or extended cDNA 
from which the probe was derived and the sequences 
of the cDNA or genomic DNA which hybridized to the 
detectable probe nnay be stored on a computer readable 
medium as described below and compared to one an- 
other using any of a variety of algorithms familiar to 
those skilled in the art. those described below. 
[0241] To determine the level of homology between 
the polypeptide encoded by the hybridizing cDNA or ge- 
nomic DNA and the polypeptide encoded by the 5'EST, 
consensus contigated 5'EST or extended cDNA from 
which the probe was derived, the polypeptide sequence 
encoded by the hybridized nucleic acid and the polypep- 
tide sequence encoded by the 5'EST, consensus conti- 
gated 5'EST or extended cDNA from which the probe 
was derived are compared. The sequences of the 
polypeptide encoded by the 5'EST. consensus contigat- 
ed 5'EST or extended cDNA from which the probe was 



derived and the polypeptide sequence encoded by the 
cDNA or genomic DNA which hybridized to the detect- 
able probe may be stored on a computer readable me- 
dium as described below and compared to one another 

5 using any of a variety of algorithms familiar to those 
skilled in the art, those described bebw. 
[0242] Protein and/or nucleic acid sequence homolo- 
gies may be evaluated using any of the variety of se- 
quence comparison algorithms and programs known in 

^0 the art. Such algorithms and programs include, but are 
by no means limited to, TBLASTN, BLASTR FASTA, 
TFASTA, and CLUSTALW (Pearson and Lipman. 1988, 
Proc. Nail. Acad ScL USA 55(^5/2444-2448; Altschul et 
aL, 1990. J, Mol. BbL 215(3):A03•4^Q\ Thompson eta!., 

IS 1994, Nucleic Acids Res, 22^2/ 4673-4680; Higgins et 
aL, 1996, Methods EnzymoL 266383-402; Altschul et 
al., 1990. J, Mol. Biol. 275(^3/403-410; Altschul et aL, 
1993, Nature Genetics 3:266-272), 
[0243] In a particulariy preferred embodiment, protein 

20 and nucleic acid sequence homologies are evaluated 
using the Basic Local Alignment Search Tool CBLAST") 
which is well known in the art (see. e.g., Karitn and Alt- 
schul, 1990, Proc. NatL Acad ScL USA 87.2267 '226B, 
Altschul etaL, 1990. J. MoL Biol 275:403-410; Altschul 

2S etaL, 1993. Nature Genetics 5266-272; Altschul etaL, 
1 997, Nuc. Acids Res. 25:3389-3402). In particular, five 
specific BLAST programs are used to perfomi the fol- 
bwing task: 
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(1) BLASTP and BI-AST3 compare an amino acid 
query sequence against a protein sequence data- 
base; 

(2) BLASTN compares a nucleotide query se- 
quence against a nucleotide sequence database; 

(3) BLASTX compares the six-frame conceptual 
translation products of a query nucleotide sequence 
(both strands) against a protein sequence data- 
base; 

(4) TBLASTN compares a query protein sequence 
against a nucleotide sequence database translated 
in all six reading frames (both strands); and 

(5) TBLASTX compares the six-frame translatnns 
of a nucleotide query sequence against the six- 
frame translations of a nudeotkie sequence data- 
base. 



[0244] The BLAST programs identify homotogous se- 
quences by kientifying simitar segments, whrch are re- 
ferred to herein as "high-scoring segment pairs,' be- 

so tween a query amino or nucleic acid sequence and a 
test sequence whk:h is preferably obtained from a pro- 
tein or nucleic ackJ sequence database. High-scoring 
segment pairs are preferably identified (i.e., aligned) by 
means of a scoring matrix, many of which are known in 

55 the art. Preferably, the scoring matrix used is the 
BLOSUI\462 matrix (Gonnet et al., 1992, Science 25e. 
1443-1445; Henikoff and Henikoff. 1993. Proteins 17. 
49-61). Less preferably, the PAM or PAM250 matrices 
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may also be used (see, e.g., Schwartz and Dayhoff, 
eds., 1978. Matrices for Detecting Distance Relation- 
ships: Atlas of Protein Sequence and Stnjcture, Wash- 
ington: National Biomedical Research Foundation) 
[0245] The BLAST programs evaluate the statistical s 
significance of all high-scoring segment pairs identified, 
and preferably selects those segments which satisfy a 
user-specified threshold of significance, such as a user- 
specified percent homology. Preferably, the statistical 
significance of a high-scoring segment pair is evaluated 'O 
using the statistical significance formula of Karlin (see, 
e.g.. Karlin and Altschul, 1990. Proc. Natl. Acad. Sci. 
USA 5712267-2268). 

[0246] The parameters used with the above algo- 
rithms may be adapted depending on the sequence is 
length and degree of homology studied. In some em- 
bodiments, the parameters may be the default parame- 
ters used by the algorithms in the absence of instruc- 
tions from the user. 

[0247] In some embodiments, the level of homology zo 
between the hybridized nucleic acid and the extended 
cDNA, 5'EST, or 5' consensus contigaled EST from 
which the probe was derived may be determined using 
the FASTDB algorithm described in Brutlag et al. Comp. 
App. Biosci. 6:237-245, 1990. In such analyses the pa- 2S 
rameters may be selected as follows: Matrix=Unitary, k- 
tuple=4, Mismatch Penalty=1 , Joining Penalty=30, Ran- 
domization Group Length=0, Cutoff Score=1, Gap Pen- 
alty=5, Gap Size Penalty=0.05. Window Size=500 or the 
length of the sequence which hybridizes to the probe, 30 
whichever is shorter Because the FASTDB program 
does not consider 5' or 3* truncations when calculating 
homology levels, if the sequence which hybridizes to the 
pr9be is truncated relative to the sequence of the ex- 
tended' cDNA* 5'EST, or consensus coritigated 5'EST 35 
from which the probe was derived the homology level is 
manually adjusted by calculating the number of nucle- 
otides of the extended cDN A, 5'EST or consensus con- 
tigated 5' EST which are not matched or aligned with 
the hybridizing sequence, determining the percentage ^ 
of total nucleotides of the hybridizing sequence which 
the non-matched or non-aligned nucleotides represent, 
and subtracting this percentage from the homology lev- 
el. For example, if the hybridizing sequence is 700 nu- 
cleotides in length.and the extended cDNA. 5'EST or ^ 
consensus contigated 5' EST sequence is 1000 nucle- 
otides in length wherein the first 300 bases at the 5' end 
of the extended cDN A. 5'EST, or consensus contigated 
5' EST are absent from the hybridizing sequence, and 
wherein the overlapping 700 nucleotides are identical, so 
the homology level would be adjusted as follows. The 
non-matched, non-aligned 300 bases represent 30% of 
the length of the extended cDNA, 5'EST, or consensus 
contigated 5' EST If the overlapping 700 nucleotides are 
100% identical, the adjusted homology level would be ss 
100-30=70% homology. It should be noted that the pre- 
ceding adjustments are only made when the non- 
matched or non-aligned nucleotides are at the 5' or 3' 



ends. No adjustments are made if the non-matched or 
non-aligned sequences are internal or under any other 
conditions. 

[0248] For example, using the above methods, nucle- 
ic acids having at least 95% nucleic acid homology, at 
least 96% nucleic acid homology, at least 97% nucleic 
acid hornology, at least 98% nucleic acid homology, at 
least 99% nucleic acid homology, or more than 99% nu- 
cleic acid homology to the extended cDNA, 5'EST, or 
consensus contigated 5' EST from which the probe was 
derived may be obtained and identified. Such nucleic 
acids may be allelic variants or related nucleic acids 
from other species. Similarly, by using progressively 
less stringent hybridization conditions one can obtain 
and identify nucleic acids having at least 90%. at least 
85%, at least 8iO% or at least 75% homology to the ex- 
tended cDNA, 5'EST or consensus contigated 5' EST 
from which the probe was derived. 
[0249] Using the above methods and algorithms such 
as FASTA with parameters depending on the sequence 
length and degree of honrK)logy studied, for example the 
default parameters used by the algorithms in the ab- 
sence of instructions from the user, one can obtain nu- 
cleic acids encoding proteins having at least 99%. at 
least 98%, at least 97%. at least 96%, at least 95%. at 
least 90%, at least 85%, at least 80% or at least 75% 
homology to the protein encoded by the extended cO- 
NA, 5'EST, or consensus contigated 5' EST from which 
the probe was derived. In some embodiments, the ho- 
mology levels can be determined using the "defaulf 
opening penalty and the "defaulf gap penalty, and a 
scoring nnatrix such as PAM 250 (a standard scoring ma- 
trix; see Dayhoff et al., in: Atlas of Protein Sequence and 
Structure, Vol. 5, Supp. 3 (1 978)). 
[0250] Alternatively, the level of polypeptide homolo- 
gy may be determined using the FASTDB algorithm de- 
scribed by Brutlag et al. Comp. App. Biosci. 6:237-245, 
1 990. In such analyses the parameters may be selected 
as follows: Matrix=PAM 0, k-tuple=2, Mismatch Penal-, 
ty=1. Joining Penalty=20, Randomization Group 
Length=0, Cutoff Score=1, Window Size=Sequence 
Length. Gap Penalty=5, Gap Size Penalty=0.05, Win- 
dow Size=500 or the length of the homologous se- 
quence, whichever is shorter. If the honrwiogous amino 
acid sequence is shorter than the amino acid sequence 
encoded by the extended cDNA. 5'EST, or consensus 
contigated 5' EST as a result of an N terminal and/or C 
terminal deletion the results may be manually corrected 
as folbws. First, the number of amino acid residues of 
the amino acid sequence encoded by the extended cD- 
NA, jS'EST, or consensus contigated 5' EST which are 
not matched or aligned with the homologous sequence 
is determined. Then, the percentage of the length of the 
sequence encoded by the extended cDNA, 5'EST, or 
consensus contigated 5' EST which the non-matched or 
non-aligned amino acids represent is calculated. This 
percentage is subtracted from the honrK)logy level. For 
example wherein the amino acid sequence encoded by 
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the extended cDNA, 5'EST, or consensus contigated 5' 
EST is 100 amino acids in length and the length of the 
homologous sequence is 80 amino acids and wherein 
the amino acid sequence encoded by the extended cD- 
NA or 5'EST is truncated at the N terminal end with re- 
spect to the homologous sequence, the homology level 
is calculated as follows. In the preceding scenario there 
are 20 non-matched, non-aligned amino acids in the se- 
quence encoded by the extended cDN A, 5'EST, or con- 
sensus contigated 5' EST This repreisents 20% of the 
length of the amino acid sequence encoded by the ex- 
tended cDNA, 5'EST, or consensus contigated 5* EST 
If the remaining amino acids are 1005 identical between 
the two sequences, the homology level would be 1 00%- 
20%=80% homology. No adjustments are made if the 
non-matched or non-aligned sequences are internal or 
under any other conditions. 

[0251] In addition to the above described methods, 
other protocols are available to obtain extended cDNAs 
using 5' ESTs or consensus contigated 5*ESTs as out- 
lined in the following paragraphs. 
[0252] Extended cDNAs may be prepared by obtain- 
ing mRNA from the tissue, cell, or organism of interest 
using mRNA preparation procedures utilizing polyA se- 
lection procedures or other techniques known to those 
skilled in the art. A first primer capable of hybridizing to 
the polyA tail of the mRNA is hybridized to the mRNA 
and a reverse transcription reaction is performed to gen- 
erate a first cDNA strand. 

[0253] The first cDN A strand is hybridized to a second 
primer containing at least 10 consecutive nucleotides of 
the sequences of SEQ ID NOs 24-4100 and 
8178-36681. Preferably/the primer comprises at least 
10, 12, 15, 17. 18, 20, 23, 25. or 28 consecutive nucle- 
otides from the sequences of SEQ ID NOs 24-4100 and 
8178-36681. In some embodiments, the primer com- 
prises more than 30 nucleotides from the sequences of 
SEQ ID NOs 24-4100 and 8178-36681. If it is desired 
to obtain extended cDNAs containing the full protein 
coding sequence, including the authentic translation in- 
itiation site, the second primer used contains sequences 
located upstream of the translatk)n initiation site. The 
second primer is extended to generate a second cDNA 
strand complementary to the first cDN A strand. Alterna- 
tively, RT-PCR may be performed as described above 
using primers from both ends of the cDNA to be ob- 
tained. 

[0254] Extended cDNAs containing 5' fragments of 
the mRNA may be prepared by hybridizing an mRNA 
comprising the sequences of SEQ ID NOs: 24-41 00 and 
8178-36681 with a primer comprising a complementary 
to a fragment of an EST-related nucleic acid hybrklizing 
the primer to the mRNAs, and reverse transcribing the 
hybridized primer to make a first cDNA strand from the 
mRNAs. Preferably, the primer comprises at least 10, 
1 2, 1 5, 1 7, 1 8, 20, 23, 25, or 28 consecutive nucleotides 
of the sequences complementary to SEQ ID NOs: 
24-4100 and 8178-36681. 



[0255] Thereafter, a second cDNA strand comple- 
mentary to the first cDNA strand is synthesized. The 
second cDNA strand may be made by hybridizing a 
primer complementary to sequences in the first cDNA 
5 strand to the first cDNA strand and extending the primer 

10 generate the second cDNA strand. 

[0256] The double stranded extended cDNAs made 
using the methods described above are isolated and 
cloned. The extended cDNAs may be cloned into vec- 
tors such as plasmids or viral vectors capable of repli- 
cating in an appropriate host cell. For example, the host 
cell may be a bacterial, mammalian, avian, or insect cell. 
[0257] Techniques for isolating mRNA, reverse tran- 
scribing a primer hybridized to mRNA to generate a first 
'5 cDNA strand, extending a primer to make a second cD- 
NA strand complementary to the first cDNA strand, iso- 
lating the double stranded cDN A and cbning the double 
stranded cDNA are well known to those skilled in the art 
and are described in Current Protocols in Molecular Bi- 
20 oiogy, John Wiley & Sons, Inc. 1997 and Sambrook et 
a/.. Molecular Cloning: A Laboratory Manual, Second 
Edition, Cold Spring Harbor Laboratory Press, 1989. 
[0258] Alternatively, other procedures may be used 
for obtaining full-length cDNAs or extended cDNAs. In 
one approach, full-length or extended cDNAs are pre- 
pared from mRNA and ctoned into double stranded 
phagemids as foltows. The cDNA library in the double 
stranded phagemids is then rendered single stranded 
by treatment with an endonuclease, such as the Gene 

11 product of the phage Fl and an exonuclease (Chang 
etal., Gene 127:95-8. 1993). A biotinylated oligonucle- 
otide comprising the sequence of a fragment of an EST- 
related nuclec acid is hybridized to the single stranded 
phagemids. Preferably, the fragment comprises at least 
10, 12, 15, 17, 18, 20, 23, 25, or 28 consecutive nucle- 
otides of the sequences of SEQ ID NOs: 24-4100 and 
8178-36681. 

[0259] Hybrids between the biotinylated oligonucle- 
otide and phagemids are isolated by incubating the hy- 
brids with streptavidin coated paramagnetic beads and 
retrieving the beads with a magnet (Fry etaL, Biotech- 
niques, 13: 124-131. 1992). Thereafter, the resulting 
phagemids are released from the beads and converted 
into double stranded DNA using a primer specific for the 
5' EST or consensus contigated 5'EST sequence used 
to design the bbtinylated oligonucleotide. Altemativety 
protocols such as the Gene Trapper kit (Gibco BRL) may 
be used. The resulting double stranded DNA is trans- 
formed into bacteria. Extended cDNAs or full length cD- 
NAs containing the 5' EST or consensus contigated 
5'EST sequence are identified by colony PCR or colony 
hybrkiization. 

[0260] Using any of the above described methods in 
section III, a plurality of extended cDNAs containing full- 
length protein coding sequences or portions of the pro- 
tein coding sequences may be provided as cDNA librar- 
ies for subsequent evaluatk)n of the encoded proteins 
or use in diagnostic assays as described below. 
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EXAMPLE 19 

Full Length cDNAs 

[0261] The procedures described in Example 17 and 
18 were used to obtain 376 extended cDNAs or full 
length cDNAs derived from 5' ESTs in a variety of tis- 
sues. The following list provides a few examples of thus 
obtained cDNAs. 

[0262] Using this procedure, the full length cDNA of 
SEQ ID NO:1 (internal identification number 
58-34-2-E7-FL2) was obtained. This cDNA encodes the 
signal peptide MWWFQQGLSFLPSALVIWTSA (SEQ 
ID NO: 2) having a von Heijne score of 5.5. 
[0263] Using this approach, the full length cDNA of 
SEQ ID NO: 3 (internal identification number 
48-1 9-3-G1 -FL1 ) was obtained. This cDNA encodes the 
signal peptide MKKVLLLITAILAVAVG (SEQ ID NO: 4) 
having a von Heijne score of 8,2. 
[0264] The full length cDNA of SEQ ID NO:5 (Internal 
identification number 58-35-2-F10-FL2) was also ob- 
tained using this procedure. This cDNA encodes a sig- 
nal peptide LWLLFFLVTAIHA (SEQ ID NO:6) having a 
von Heijne score of 10.7. 

[0265] Furthermore, the polypeptides encoded by the 
extended or full-length cDNAs may be screened for the 
presence of known structural or functional rrwtifs or for 
the presence of signatures, small amino acid sequences 
which are well consen/ed amongst the members of a 
protein family. The results obtained for the polypeptides 
encoded by a few full-length cDN As derived from 5'ESTs 
that were screened for the presence of known protein 
signatures and motifs using the Proscan software from 
the GCG package and the Prosite 15.0 database are 
provided below. 

[0266] The protein of SEQ ID NO: 8 encoded by the 
full-length cDNA SEQ ID NO: 7 (internal designation 
78-8-3-E6-CL0_1 C) and expressed in adult prostate be- 
long to the phosphatidylethanolamine-binding protein 
from which it exhibits the characteristic PROSITE sig- 
nature from positions 90 to 1 1 2. Proteins from this wide- 
spread family, from nematodes to fly, yeast, rodent and 
primate species, bind hydrophobic ligands such as 
phospholipkJs and nucleotides. They are nriostly ex- 
pressed in brain and In testis and are thought to play a 
role in cell growth and/or maturation, in regulation of the 
sperm maturation, nx)titity and in membrane remode- 
ling. They nnay act either through signal transduction or 
through oxkioreduction reactions (for a review see Sch- 
oentgen and Joll6s, FEBS Letters, 369 :22-26 (1995)). 
Taken together, these data suggest that the protein of 
SEQ ID NO: 8 may play a role in cell growth, maturation 
and in membrane remodeling and/or may be related to 
male fertility. Thus, these protein may be useful in diag- 
nosing and/or treating cancer, neurodegenerative dis- 
eases, and/or disorders related to male fertility and ste- 
rility. 

[0267] The protein of SEQ I D NO : 1 0 encoded by the 



full-length cDNA SEQ ID NO:9 (internal designation 
108-01 3-5-0-H9-FLC) shows horriologies with a family 
of lysophospholipases conserved among eukaryotes 
(yeast, rabbit, rodents and human). In addition, some 
5 members of this family exhibit a calcium-independent 
phospholipase A2 activity (Portilla etai, J. Am. Soc. Ne- 
phro,, 9 :11 78-1186 (1998)). All members of this family 
exhibit the active site consensus GXSXG motif of car- 
boxylesterases that is also found in the protein of SEQ 
10 ID NO :10 (position 54 to 58). In additbn, this protein 
may be a membrane protein with one transmembrane 
donrain as predicted by the software TopPred II (Claros 
and von Heijne, CABIOS applic. Notes, 10:685-686 
(1 994)). Taken together, these data suggest that the pro- 
's tein of SEQ ID NO: 10 may play a role in fatty acid me- 
tabolism, probably as a phospholipase. Thus, this pro- 
tein or part therein, may be useful in diagnosing and/or 
treating several disorders including, but not limited to, 
cancer, diabetes, and neurodegenerative disorders 
such as Parkinson's and Alzheimer's diseases. It may 
also be useful In modulating inflammatory responses to 
Infectious agents and/or to suppress graft rejection. 
[0268] The protein of SEQ ID NO: 1 2 encoded by the 
full-length cDNA SEQ ID NO: 11 (intemal designation 
108-004-5-0-D10-FLC) shows remote homology to a 
subfamily of beta4-galactosyltransferases widely con- 
served in animals (human, rodents, cow and chicken). 
Such enzymes, usually type II membrane proteins lo- 
cated in the endoplasmic reticulum or in the Golgi ap- 
paratus, catalyzes the biosynthesis of glycoproteins, 
glycolipid glycans and lactose. Their characteristic fea- 
tures defined as those of subfamily A in Breton et al, J. 
Biochem,, 123:1000-1009 (1998) are pretty well con- 
served in the protein of SEQ ID NO: 12. especially the 
region r containing the DVD motif (positions 163-165) 
thought to be involved either in UDP binding or in the 
catalytic process itself. In addition, the protein of SEQ 
ID NO: 12 has the typk:al structure of a type It protein. 
Indeed, it contains a short 28-amino-ackJ-long N-termi- 
nal tail, a transmembrane segment from positions 29 to 
49 and a large 278-amino-ackj-tong C-terminal tail as 
predk:ted by the software TopPred II (Claros and von 
Heijne, CABIOS applic. Notes, 10:685-686 (1994)). 
Taken together, these data suggest that the protein of 
SEQ ID NO: 12 may play a role in the bbsynthesis of 
polysaccharides, and of the carbohydrate nnoieties of 
glycoproteins and gtycolipkjs and/or in cell-cell recogni- 
tion. Thus, this protein may be useful in diagnosing and/ 
or treating several types of disorders including, but not 
limited to, cancer, atherosclerosis, cardiovascular disor- 
ders, autoimmune disorders and rheumatic diseases in- 
cluding rheumatokJ arthritis. 

[0269] The protein of SEQ ID NO: 14 encoded by the 
full-length cDNA SEQ ID NO: 13 (internal designatbn 
1 08-009-5-0-A2-FLC) shows extensive honrralogy to the 
bZIP family of transcription factors, and especially to the 
hunr^an luman protein (Lu et aL, Mol. Cell. Biol., 17: 
5117-5126 (1997))). The match include the whole bZIP 
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domain composed of a basic DNA-binding domain and 
of a leucine zipper allowing protein dimerization. The ba- 
sic domain is conserved in the protein of SEQ ID NO: 
14 as shown by the characteristic PROSITE signature 
(positions 224-237) except for a conservative substitu- 
tion of a glutamic acid with an aspartic acid in position 
233. The typical PROSITE signature for leucine zipper 
is also present (positions 259 to 280). Taken together, 
these data suggest that the protein of SEQ ID NO: 14 
may bind to DNA, hence regulating gene expression as 
a transcription factor. Thus, this protein may be useful 
in diagnosing and/or treating several types of disorders 
including, but not limited to, cancer. 
[0270] Bacterial clones containing plasmids contain- 
ing the full length cDN As described above are presently 
stored in the inventor's laboratories under the internal 
identification numbers provided above. The inserts may 
be recovered from the deposited materials by growing 
an aliquot of the appropriate bacterial clone in the ap- 
propriate medium. The plasmid DNAcan then be isolat- 
ed using plasmid isolation procedures familiar to those 
skilled in the art such as alkaline lysis minipreps or large 
scale alkaline lysis plasmid Isolation procedures. If de- 
sired the plasmid DNA may be further enriched by cen- 
trif ugation on a cesium chloride gradient, size exclusion 
chromatography, or anwn exchange chromatography 
The plasmid DNA obtained using these procedures may 
then be nnanipulated using standard cloning techniques 
familiar to those skilled in the art. Alternatively, a PGR 
can be done with primers designed at both ends of the 
EST insertion. The PGR product which corresponds to 
the 5'EST can then be manipulated using standard clon- 
ing techniques familiar to those skilled in the art. 

IV. Expression of Proteins 

[0271] EST-related nucleic acids, fragments of EST- 
related nucleic acids, positk)nal segments of EST-relat- 
ed nucleic acids, and fragments of positional segments 
of EST-related nucleic acids may be used to express the 
polypeptkjes which they encode. In partrcular, they may 
be used to express EST-related polypeptides, frag- 
ments of EST-related polypeptides, positional segments 
of EST-related polypeptides, or fragments of positional 
segments of EST^nslated polypeptides. In some embod- 
iments, the EST-related nucleic acids, positional seg- 
ments of EST-related nucleic ackls, and fragments of 
positional segments of EST-related nucleic acids may 
be used to express the full polypeptide (i.e. the signal 
peptide and the mature polypeptkJe) of a secreted pro- 
tein, the mature protein (i.e. the polypeptide generated 
after cleavage of the signal peptkle), or the signal pep- 
tide of a secreted protein. If desired, nucleic acids en- 
coding the signal peptkie nnay be used to facilitate se- 
cretbn of the expressed protein. It will be appreciated 
that a plurality of EST-related nucleic acids, fragments 
of EST-related nuciek: acids, positbnal segments of 
EST-related nuciek; acids, or fragments of positional 



segments of EST-related nucleic acids may be simulta- 
neously ctoned into expression vectors to create an ex- 
pression library for analysis of the encoded proteins as 
described below. 

5 

EXAMPLE 20 

Expression of the Proteins Encoded bv the Genes 
Corresponding to the 5'ESTs or Gonsensus Gontiqated 
10 5' ESTs 

[0272] To express their encoded proteins the EST-re- 
lated nucleic acids, fragments of EST-related nucleic ac- 
kJs, positional segments of EST-related nucleic acids, 

'5 or fragments of positional segments of EST-related nu- 
cleic acids are cloned into a suitable expression vector. 
In some instances, nuciek: acids encoding EST-related 
polypeptides, fragments of EST-related polypeptides, 
positional segments of EST-related polypeptides or 

20 fragments of positional segments of EST-related 
polypeptides may be cloned into a suitable expresson 
vector. 

[0273] In some embodiments, the nucleic acids in- 
serted into the expression vector may comprise the cod- 

25 Ing sequence of a sequence selected from the group 
consisting of 24-41 00. In other embodiments, the nucle- 
c ackis inserted into the expressbn vector may com- 
prise may comprise the full coding sequence (i.e. the 
nucleotides encoding the signal peptkie and the mature 

30 polypeptide) of one of SEQ ID NOs: 3721 -3811 . In some 
embodiments, the nucleic acid inserted into the expres- 
sk)n vector may comprise the nucleotides of one of the 
sequences of SEQ ID NOs: 3721-3811 which encode 
the mature polypeptkJe (i.e. the nucleotides encoding 

35 the polypeptide generated after cleavage of the signal 
peptkJe). In further embodiments, the nucleic acids in- 
serted into the expression vector may comprise the nu- 
cleotides of 24-652 and 3721-3811 which encode the 
signal peptide to facilitate secretion of the expressed 

^ protein. The nucleic acids inserted into the expressk^n 
vectors may also contain sequences upstream of the se- 
quences encoding the signal peptkie, such as sequenc- 
es which regulate expression levels or sequences which 
confer tissue specific expresson. 

45 [0274] The nuciek; acid inserted into the expression 
vector may encode a polypeptide comprising the one of 
the sequences of SEQ I D NOs: 41 01 -81 77. In some em- 
bodiments, the nucleic acid inserted into the expressbn 
vector way encode the full polypeptkle sequence (i.e. 

so the signal peptide and the mature polypeptide) included 
in one of SEQ ID NOs; 7798-7888. In other embodi- 
ments, the nuciek; acicf inserted into the expression vec- 
tor may encode the mature polypeptide (i.e. the 
polypeptide generated after cleavage of the signal pep- 

ss tide) included in one of the sequences of SEQ ID NOs: 
798-7888. In further embodiments, the nuciek; acids in- 
serted into the expressbn vector may encode the signal 
peptkie included in one of the sequences of 4101-4729 
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and 7798-7888. 

[0275] The nucleic acid encoding the protein or 
polypeptide to be expressed is operably linked to a pro- 
nnoter in an expression vector using conventional clon- 
ing technology. The expression vector may be any of 
the mammalian, yeast, insect or bacterial expression 
systems known in the art. Commercially available vec- 
tors and expression systems are available from a variety 
of suppliers including Genetics Institute (Cambridge, 
MA), Stratagene (La Jolla, California), Promega (Madi- 
son, Wisconsin), and Invitrogen (San Diego, California). 
If desired, to enhance expression and facilitate proper 
protein folding, the codon context and codon pairing of 
the sequence may be optimized for the particular ex- 
pression organism in which the expression vector is in- 
troduced, as explained by Hatfield, et al., U.S. Patent 
No. 5,082,767. 

[0276] The following is provided as one exemplary 
method to express the proteins encoded by the nucleic 
acids described above. In some instances the nucleic 
acid encoding the protein or polypeptide to be ex- 
pressed includes a methionine initiation codon and a 
polyA signal. If the nucleic acid encoding the polypep- 
tide to be expressed lacks a methionine to serve as the 
initiation site, an initiating methionine can be introduced 
next to the first codon of the nuciek; acid using conven- 
tional techniques. Similarly. If the nucleic acid encoding 
the protein or polypeptide to be expressed lacks a polyA 
signal, this sequence can be added to the construct by, 
for example, splrcing out the potyA signal from pSG5 
(Stratagene) using Bgll and Sail restriction endonucle- 
ase enzymes and incorporating it into the mammalian 
expression vector pXT1 (Stratagene). pXT1 contains 
the LTRs and a portion of the gag gene from Moloney 
Murine Leukemia Virus. The positkjn of the LTRs in the 
construct allow efficient stable transfection. The vector 
includes the Herpes Simplex thymWine kinase promoter 
and the selectable neomycin gene. The nucleic acid en- 
coding the polypeptide to be expressed is obtained by 
PCR from the bacterial vector using oligonucleotide 
primers conriplementary to the nucleic ackJ encoding the 
protein or polypeptide to be expressed and containing 
restriction endonuclease sequences for Psl I incorpo- 
rated into the 5'primer and Bglll at the 5' end of 3* primer, 
taking care to ensure that the nucleic acid encoding the 
protein or polypeptide to be expressed is correctly po- 
sitioned with respect to the pofy A signal. The purified 
fragment obtained from the resulting PCR reaction is di- 
gested with Pstl. blunt ended with an exonuclease, di- 
gested with Bgl li, purified and ligated to pXTl. now con- 
taining a poly A signal and digested with Bglll. 
[0277] The ligated product is transfected into mouse 
NIH 3T3 cells using Lipofectin (Life Technologies, Inc.. 
Grand Island. New Yoric) under conditions outlined in the 
product specrficatbn. Positive transfectants are select- 
ed after growing the transfected cells in 600 fig/ml G41 8 
(Sigma, St. Louis, Missouri). 

[0276] Alternatively, the nucleic acid encoding the 



protein or polypeptkle to be expressed may be cloned 
into pED6dpc2 as described above. The resulting 
pED6dpc2 constructs may be transfected into a suitable 
host cell, such as COS 1 cells. Methotrexate resistant 
5 cells are selected and expanded. The expressed protein 
or polypeptide may be isolated, purified, or enriched as 
described above. 

[0279] To confirm expression of the desired protein or 
polypeptide, the proteins or polypeptides produced by 
w cells containing a vector with a nucleic acid insert en- 
coding the protein or polypeptide are compared to those 
lacking such an Insert. The expressed proteins are de- 
tected using techniques familiar to those skilled in the 
art such as Coomassie blue or silver staining or using 

^5 antibodies against the protein or polypeptide encoded 
by the nucleic acid insert. Antibodies capable of specif- 
k:ally recognizing the protein of interest may be gener- 
ated using synthetic 15*mer peptides having a se- 
quence encoded by the appropriate nucleic acid. The 

20 synthetic peptides are injected into mice to generate an- 
tibody to the polypeptide encoded by the nucleic acki. 
[0260] If the proteins or polypeptides encoded by the 
nucleic ackt inserts are secreted, medium prepared 
from the host cells or organisms containing an expres- 

25 sbn vector which contains a nuciek: acid insert encod- 
ing the desired protein or polypeptide is compared to 
mdieum prepared from the control cells or organism. 
The presence of a band in medium from the cells con- 
taining the nucleic ackd insert which is absent from prep- 

30 arations from the control cells indk:ates that the protein 
or polypeptkie encoded by the nucleic acid insert is be- 
ing expressed and secreted. Generally, the band corre- 
sponding to the protein encoded by the nuciek: acid In- 
sert will have a mobility near that expected based on the 

35 number of amino acids in the open reading frame of the 
nucleic acid insert. However, the band nr>ay have a nrto- 
bility different than that expected as a result of modifi- 
cations such as glycosylation, ubiquitinatbn. or enzy- 
matk: cleavage. 

40 [0281] Alternatively, if the protein expressed from the 
above expressk>n vectors does not contain sequences 
directing its secretion, the proteins expressed from host 
cells containing an expression vector with an insert en- 
coding a secreted protein or portion thereof can be com- 

^ pared to the proteins expressed in control host cells con- 
taining the expression vector without an insert. The 
presence of a band in samples from cells containing the 
expressk}n vector with an insert which is absent in sam- 
ples from cells containing the expression vector without 

50 an insert indicates that the desired protein or portion 
thereof is being expressed. Generally, the band will 
have the mobility expected for the secreted protein or 
portion thereof. However, the band may have a mobility 
different than that expected as a result of nrxxjifications 

55 such as glycosylation, ubk^uitination, or enzymatic 
cleavage. 

[0282] The expressed protein or polypeptide may be 
purified, isolated or enrk:hed using a variety of methods. 
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In some methcxjs. the protein or polypeptide may be se- 
creted into the culture medium via a native signal pep- 
tide or a heterologous signal peptide operabty linked 
thereto. In some methods, the protein or polypeptide 
may be linked to a heterobgous polypeptide which fa- 
cilitates its isolation, purification, or enrichment such as 
a nickel binding polypeptide. The protein or polypeptide 
may also be obtained by gel electrophoresis, bn ex- 
change chromatography, size chromatography, hpic, 
salt precipitation, immunoprectprtat ion, a combination of 
any of the preceding methods, or any of the isolation, 
purification, or enrichment techniques familiar to those 
skilled in the art. 

[0283] The protein encoded by the nucleic acid insert 
may also be purified using standard immunochronnatog- 
raphy techniques using immunoaffinity chromatography 
with antibodies directed against the encoded protein or 
polypeptide as described in nr^ore detail below. If anti- 
body production is not possible, the nucleic ackt insert 
encoding the desired protein or polypeptide may be in- 
corporated into expression vectors designed for use in 
purification schemes employing chimeric polypeptides. 
In such strategies, the coding sequence of the nucleic 
acid insert is ligated in frame with the gene encoding the 
other half of the chimera. The other half of the chimera 
may be ^-gbbin or a nickel binding polypeptide. A chro- 
matography nr^trix having antibody to p-gtobin or nkikel 
attached thereto is then used to purify the chimeric pro- 
tein. Protease cleavage sites may be engineered be- 
tween the p-gbbin gene or the nickel binding polypep- 
tide and the extended cDNA or portion thereof. Thus, 
the two polypeptides of the chimera may be separated 
from one another by protease digestion. 
[0284] One useful expressbn vector for generating p- 
globin chimerics is pSGS (Stratagene), which ericodes 
rabbit p-globin. Inlron II of the rabbit p-globin gene facil- 
itates splicing of the expressed transcript, and the poly- 
adenylatbn signal incorporated into the construct in- 
creases the level of expression. These techniques as 
described are well known to those skilled in the art of 
molecular biology. Standard methods are published in 
methods texts such as Davis et at., (Basic Methods in 
Molecular Biology, L.G. Davis, M.D. Dibner, and J.F. 
Battey, ed., Elsevier Press, NY. 1986) and many of the 
methods are available from Stratagene, Life Technolo- 
gies, Inc., or Promega. PofypeptkJe may additbnally be 
produced from the construct using in vitro translation 
systems such as the In vitro Express™ Translatbn Kit 
(Stratagene). 

[0285] Following expression and purificatbn of the 
proteins or polypeptides encoded by the nucleic acki in- 
serts, the purified proteins may be tested for the ability 
to bind to the surface of various cell types as described 
in Example 21 below. It will be appreciated that a plural- 
ity of proteins expressed from these nucleb acid inserts 
may be included in a panel of proteins to be simultane- 
ously evaluated for the activities specifically described 
below, as well as other biological roles for which assays 



for determining activity are available. 
EXAMPLE 21 

5 Analysis of Secreted Proteins to Determine Whether 
they Bind to the Cell Surface 

[0286] The EST-related nucleic acids, fragments of 
EST- related nucleic acids, posrtkxial segments of EST- 
10 related nucleic acids, fragments of positional segments 
of EST-related nucleb acids, nucleic acids encoding the 
EST-related polypeptides, nucleb acids encoding frag- 
ments of the EST-related polypeptides, nucleic acids 
encoding positional segments of EST-related polypep- 
'5 tides, or nucleic ackis encoding fragments of positional 
segments of EST-related polypeptides are cloned into 
expressbn vectors such as those described in Example 
20. The encoded proteins or polypeptides are purified, 
isolated, or enrrched as described above. Following pu- 
rification, isolation, or enrichment, the proteins or 
polypeptides are labeled using techniques known to 
those skilled in the art. The labeled proteins or polypep- 
tides are incubated with cells or cell lines derived from 
a variety of organs or tissues to allow the proteins to 
bind to any receptor present on the cell surface. Foltow- 
ing the incubatbn, the cells are washed to remove non- 
specifically bound proteins or polypeptides. The specif- 
ically bound labeled proteins or polypeptides are detect- 
ed by autoradbgraphy. Alternatively, unlabeled proteins 
or polypeptbes may be incubated with the cells and de- 
tected with antibodies having a detectable label, such 
as a fluorescent molecule, attached thereto. 
[0287] Specrficrty of cell surface binding nr^y be ana- 
lyzed by conducting a competition analysis in which var- 
ious amounts of unlabeled protein or polypeptide are in- 
cubated along with the labeled protein or polypeptide. 
The amount of labeled protein or polypeptide bound to 
the cell surface decreases as the amount of competitive 
unlabeled protein or polypeptide increases. As a control, 
varbus amounts of an unlabeled protein or polypeptbe 
unrelated to the labeled protein or polypeptide is includ- 
ed in some binding reactbns. The amount of labeled 
protein or polypeptide bound to the cell surface does not 
decrease in binding reactions containing increasing 
amounts of unrelated unlabeled protein, indicating that 
the protein or polypeptide encoded by the nucleb acid 
binds specifically to the cell surface. 
[0288] As discussed above, human proteins have 
been shown to have a number of important physiological 
effects and, consequently, represent a valuable thera- 
peutb resource. The human proteins or polypeptides 
made as described above may be evaluated to deter- 
mine their physiobgical activities as described below. 
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EXAMPLE 22 

Assaying the Expressed Proteins or Polypeptides for 
Cytokine. Cell Proliferation or Cell Differentiation 
Activity 

[0289] As discussed above, some hunr>an proteins act 
as cytokines or may affect cellular proliferation or differ- 
entiation. Many protein factors discovered to date, in- 
cluding all known cytokines, have exhibited activity in 
one or more factor dependent celt proliferation assays, 
and hence the assays serve as a convenient confirma- 
tion of cytokine activity. The activity of a protein or 
polypeptide of the present invention is evidenced by any 
one of a number of routine factor dependent cell prolif- 
eration assays for cell lines including, without limitation, 
32D. DA2, DA1G, TIG. B9, B9/11, BaF3, MC9/G, M+ 
(preB f^), 2E8, RB5, DAI, 123, Til 65, HT2, CTLL2, 
TF-1 . Mo7c and CMK. The proteins or polypeptides pre- 
pared as described above may be evaluated for their 
ability to regulate T cell or thymocyte proliferation in as- 
says such as those described above or in the following 
references: COrrent Protocols in Immunology, Ed, by J. 
E. Coligan ef a/., Greene Publishing Associates and Wi- 
ley- 1 ntersclence; Takal et al J. Immunol. 137: 
3494-3500, 1986,, Bertagnolll et al. J. Immunol. 145: 
1706-1712, 1990.. Bertagnoiti etai. Cellular Immunol- 
ogy ^3Z•.327-3^^, 1991. Bertagnolli, etal. J, ImmunoL 
149:3778-3783, 1992; Bowman etaL, J. Immunol. 162: 
1756-1761, 1994. 

[0290] In additkjn, numerous assays for cytokine pro- 
duction and/or the proliferation of spleen cells, lymph 
node cells and thymocytes are known. These include 
the techniques disck>sed in Current Protocols In Im- 
munology. J.E. Coligan etal. Eds., 1:3.12.1-3.12?14, 
John Wiley and Sons, Toronto. 1994; and Schreiber, R. 
D. In Current Protocols in Immunology, supra 1 : 
6.8.1-6.8.8. 

[0291] The proteins or polypeptides prepared as de- 
scribed above may also be assayed for the ability to reg- 
ulate the proliferation and differentiation of hematopoi- 
etic or lymphopoietic cells. Many assays for such activity 
are familiar to those skilled in the art. including the as- 
says in the following references: Bottomly etal., In Cur- 
rent Protocols in'Immunology, supra. 1 ; 6.3.1-6.3.12,; 
deVries et al., J. Exp. Med. 173:1205-1211, 1991; 
Moreau etaL, A/afara 36:690-692, 1988; Greenberger 
etal., Proc, Natl. Acad Sci. US A 80:2931 -2938, 1 983; 
Nordan, R., In Current Protocols in Immunology, supra. 
1 : 6.6.1-6.6.5; Smith et al., Proc. Natl. Acad. ScL U.S. 
A. 83:1857-1861, 1986; Bennett etal in Current Proto- 
cols in Immunology supra 1 : 6.15.1; Ciarletta et al In 
Current Protocols in Immunology supra 1 : 6. 1 3. 1 . 
[0292] The proteins or polypeptides prepared as de- 
scribed above may also be assayed for their ability to 
regulate T-cell responses to antigens. Many assays for 
such activity are familiar to those skilled in the art, In- 
cluding the assays described in the following referenc- 



es: Chapter 3 {In vitro Assays for Mouse Lymphocyte 
Function), Chapter 6 (Cytokines and Their Cellular Re- 
ceptors) and Chapter 7, (Immunologic Studies in Hu- 
mans) in Current Protocols in Immunology supra; Wein- 
s berger etai, Proc. Natl. Acad. Sci. USA 77:6091 -6095. 
1980; Weinberger etal., Eur J. Immun. 11:405-411. 
1981; Takai etal., J. Immunol. 137:3494-3500, 1986; 
Takai etal., J. Immunol. 140:508-512, 1988. 
[0293] Those proteins or polypeptides which exhibit 
10 cytokine, cell proliteratton. or cell differentiation activity 
may then be formulated as pharmaceuticals and used 
to treat clinical conditions in which inductbn of cell pro- 
liferatk^n or differentiation is beneficial. Alternatively, as 
described in more detail below, nuciek: acids encoding 
IS these proteins or polypeptkJes or nucleic acids regulat- 
ing the expression of these proteins or polypeptides may 
be introduced into appropriate host cells to increase or 
decrease the expressk^n of the proteins or polypeptides 
as desired. 

EXAMPLE 23 

Assaying the Expressed Proteins or Polypeptides for 
Activity as Immune System Regulators 

[0294] The proteins or polypeptides prepared as de- 
scribed above may also be evaluated for their effects as 
immune regulators. For example, the proteins or 
polypeptides may be evaluated for their activity to influ- 
ence thymocyte or splenocyte cytotoxicity. Numerous 
assays for such activity are familiar to those skilled in 
the art including the assays described in the following 
references: Chapter 3 (In vitro Assays for Mouse Lym- 
phocyte Function 3. 1 -3.1 9) and Chapter 7 (Immunologic 
studies in Huirnans) In Current Protocols in 
Immunology ,J.E. Coligan etal. Eds, Greene Publishing 
Associates and Wiley-lnterscience; Herrmann et al., 
Proc. Natl. Acad. Sci. USA 78:2488-2492. 1981; Herrm- 
ann ef a/., J Immunol. 128:1968-1974, 1982; Handa et 
al., J. Immunol. 135:1564-1572. 1985; Takai etai, J. 
ImmunoL 137:3494-3500, 1986; Takai etaL, J. Immu- 
noL 140:508-512. 1988; Bowman etaL, J. Virology SV. 
1992-1998; Bertagnolli et aL CelL ImmunoL 133: 
327-341, 1991; Brown et aL, J. ImmunoL 153: 
3079-3092. 1994. 

[0295] The proteins or polypeptides prepared as de- 
scribed above may also be evaluated for their effects on 
T-cell dependent immunoglobulin responses and iso- 
type switching. Numerous assays for such activity are 
familiar to those skilled in the art, including the assays 
disclosed in the following references: Maltszewski, J. 
ImmunoL 144:3028-3033, 1990; Mond et aL in Current 
Protocols in Immunology, 1 : 3.8.1-3.8.16, supra. 
[0296] The proteins or polypeptides prepared as de- 
scribed above may also be evaluated for their effect on 
Immune effector cells, including their effect on Th1 cells 
and cytotoxic lymphocytes. Numerous assays for such 
activity are familiar to those skilled in the art, including 
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the assays disclosed in the following references: Chap- 
ter 3 (tn vitro Assays for Mouse Lymphocyte Function 
3.1-3.19) and Chapter 7 (Immunologic Studies in Hu- 
mans) in Current Protocols m Immunology, supra; Takai 
etai, J. Immunol. 137:3494-3500, 1986; Takai etal.;J. 
Immunol. 140:508-512, 1988; Bertagnolli et at., J. Im- 
munol. 149:3778-3783. 1992. 
[0297] The proteins or polypeptides prepared as de- 
scribed above may also be evaluated for their effect on 
dendritic ceil mediated activation of naive T-cells. Nu- 
merous assays for such activity are familiar to those 
skilled in the art, including the assays disck^sed in the 
following references: Query et al., J. Immunol. 134: 
536-544. 1 995; Inaba et al., J. Exp. Med. 173:549-559. 
1991; Macatonia et ai, J. Immunol. 154:5071-5079. 
1995; Porgador etalJ. Exp. Med 182:255-260. 1995; 
Nair etal., J. WoL 67:4062-4069. 1993; Huang et al., 
Science 264:961-965. 1994; Macatonia et al J. Exp. 
Med 169:1255-1264, 1989; Bhardwaj et aL, Joumalof 
Clinical Investigation 94:797-807. 1994; and Inaba et 
al., J. Exp, Med 172:631 -640. 1990. 
[0298] The proteins or polypeptides prepared as de- 
scribed above may also be evaluated for their influence 
on the lifetime of lymphocytes. Numerous assays for 
such activity are familiar to those skilled in the art, in- 
cluding the assays disclosed in the foltowlng references: 
Darzynkiewicz et al.. Cytometry 13:795-808. 1992; 
Gorczyca etai, Leul<emia 7:659-670, 1993; Gorczyca 
etal.. Cancer Res. 53:1945-1951, 1993; Itoh etal.. Cell 
66:233-243, 1991; Zacharchuk, J. Immunol. 145: 
4037-4045. 1990; Zamal etal.. Cytometry U:Q9^ -897, . 
1993; Gorczyca etaL, Int J. Oncol. 1:639-648, 1992. 
[0299] The proteins or polypeptides prepared as de- 
scribed above may also be evaluated for their influence 
on early steps of T-cell commitment and devekspment. 
Numerous assays for such activity are familiar to those 
skilled in the art. including without limitation the assays 
disclosed In the following references: Antica etsL, Blood 
84:111-117. 1994; Fine et al., Cell. Immunol. 155: 
111-122, 1994; Galy etal.. B/ood 85:2770-2778, 1995; 
Toki et al., Proc. Nat. Acad Sci. USA 88:7548-7551. 
1991. 

[0300] Those proteins or polypeptides which exhibit 
activity as immune system regulators activity may then 
be formulated as'pharnnaceuticals and used to treat clin- 
ical conditions in which regulaton of immune activity is 
beneficial. For example, the protein or polypeptide may 
be useful in the treatment of vartous immune deficien- 
cies and disorders (including severe combined immun- 
odefteiency). e.g., in regulating (up or down) growth and 
proliferation of T and/or B lymphocytes, as well as ef- 
fecting the cytolytic activity of NK cells and other cell 
populattons. These immune deficiencies may be genet- 
ic or be caused by viral (e.g., HIV) as well as bacterial 
or fungal infections, or may result from autoimmune dis- 
orders. More specifically, infectious diseases caused by 
viral, bacterial, fungal or other infection nnay be treatable 
using the protein or polypeptide including infectk)ns by 



HIV. hepatitis viruses, herpesviruses, mycobacteria, 
Leishmania spp., plamodium. and various fungal infec- 
tions such as candidiasis. Of course, in this regard, a 
protein or polypeptide may also be useful where a boost 
5 to the immune system generally may be desirable, i.e., 
in the treatment of cancer 

[0301] Alternatively, the proteins or polypeptides pre- 
pared as described above may be used in treatment of 
autoimmune disorders including, for example, connec- 
10 tive tissue disease, multiple sclerosis, systemic lupus 
erythematosus, rheunr^atoid arthritis, autoimmune pul- 
monary inflammation, Guillain-Barre syndrome, autoim- 
mune thyroiditis, insulin dependent diabetes mellitis, 
myasthenia gravis, graft-versus-host disease and au- 
toimmune inflammatory eye disease. Such a protein or 
polypeptide may also to be useful in the treatment of 
allergk: reactions and conditions, such as asthma (par- 
ticularly allergic asthma) or other respiratory problems. 
Other conditions, in whk:h immune suppressbn is de- 
sired (including, for example, organ transplantation), 
may also be treatable using the protein or polypeptide. 
[0302] Using the proteins or polypeptides of the inven- 
tion it may also be possible to regulate immune respons- 
es either up or down. Down regulation may involve in- 
hibiting or blocking an immune response already in 
progress or may involve preventing the induction of an 
immune response. The functk)ns of activated T-cells 
may be inhibited by suppressing T cell responses or by 
inducing specific tolerance in T cells, or both. Immuno- 
suppression of T cell responses is generally an active 
non-antigen-specific process whbh requires continuous 
exposure of the T cells to the suppressive agent. Toler- 
ance, which involves inducing non-responsiveness or 
anergy in T cells, is distinguishable from immunosup- 
presston in that it is generally antigen-specific and per- 
sists after the end of exposure to the tolerizing agent. 
C^eratbnatly, tolerance can be denrK)nstrated by the 
lack of a T cell response upon reexposure to specific 
antigen in the absence of the tolerizing agent. 
[0303] Down regulating or preventing one or more an- 
tigen functkxis (including without limitatk>n B lym- 
phocyte antigen functions, such as. for example. B7 
costimulatkxi), e.g., preventing high level lymphokine 
synthesis by activated T cells, will be useful in situatbns 
of tissue, skin and organ transplantation and in graft- 
versus-host disease (GVHD). For example, blockage of 
T cell function should result in reduced tissue destruc- 
tion in tissue transplantation. Typk:ally, in tissue trans- 
plants, rejection of the transplant is initiated through its 
recognitran as foreign by T cells, foltowed by an immune 
reaction that destroys the transplant. The admin istratwn 
of a rDolecule which inhibits or blocks interaction of a B7 
lymphocyte antigen with its natural ligand(s) on immune 
cells (such as a soluble, nrxxiomeric form of a peptide 
having B7-2 activity alone or in conjunction with a mon- 
omeric form of a peptide having an activity of another B 
lymphocyte antigen (e.g., B7-1, B7-3) or bkxking anti- 
body), prior to transplantation, can lead to the binding 
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of the molecule to the natural ligand(s) on the immune 
cells without transmitting the corresponding costimula- 
tory signal. Blocking B lymphocyte antigen function in 
this matter prevents cytokine synthesis by immune cells, 
such as T cells, and thus acts as an immunosuppres- 
sant. Moreover, the lack of costlmulation may also be 
sufficient to anergize the T cells, thereby inducing toler- 
ance in a subject. Induction of tong-term tolerance by B 
lymphocyte antigen-btocking reagents may avokl the 
necessity of repeated administration of these blocking 
reagents. To achieve sufficient immunosuppression or 
tolerance in a subject, it may also be necessary to block 
the function of a combination of B lymphocyte antigens. 
[0304] The efficacy of particular bkxking reagents in 
preventing organ transplant rejection or GVHD can be 
assessed using animal models that are predictive of ef- 
ficacy In humans. Examples of appropriate systems 
which can be used include allogeneic cardiac grafts in 
rats and xenogeneic pancreatic islet cell grafts in mice, 
both of which have been used to examine the immuno- 
suppressive effects of CTLA4lg fusion proteins in vivo 
as described in Lenschow ©/a/., Sc/enc© 257:789-792 
(1 992) and Turka et al., Proc. Natl. Acad Sci USA, 89: 
11102-11105 (1992). In addition, murine models of 
GVHD (see Paul ed.. Fundamental Immunology, Raven 
Press, New York. 1989, pp. 846-847) can be used to 
determine the effect of blocking B lymphocyte antigen 
function in vivo on the development of that disease. 
[0305] Blocking antigen function nnay also be thera- 
peutically useful for treating autoimmune diseases. 
Many autoimmune disorders are the result of inappro- 
priate activation of T cells that are reactive against self 
tissue and which pronrKjte the production of cytokines 
and autoantibodies involved in the pathology of the dis- 
eases. Preventing the activation of autoreactive T celts 
may reduce or eliminate disease symptoms. Adminis- 
tration of reagents whrch block costimulation of T cells 
by disrupting receptor/Iigand interactions of B tym- 
phocyte antigens can be used to inhibit T cell activation 
and prevent production of autoantibodies or T celWe- 
rived cytokines which potentially involved in the disease 
process. Additionally, bkx;king reagents may induce an- 
tigen-specific tolerance of autoreactive T cells which 
could lead to long-term relief from the disease. The ef- 
ficacy of blocking reagents in preventing or alleviating 
autoimmune disorders can be determined using a 
number of well-characterized animal models of human 
autoimmune diseases. Examples include murine exper- 
imental autoimmune encephalitis, systemic lupus eryth- 
matosis in MRLypr/pr mrce or NZB hybrid mice, murine 
autoimmuno collagen arthritis, diabetes mellitus in OD 
mk:e and BB rats, and murine experimental myasthenia 
gravis (see Paul ed.. Fundamental Immunology, Raven 
Press, New York, 1989, pp. 840-856). 
[0306] Upregulatton of an antigen function (preferably 
a B lymphocyte antigen function), as a means of up reg- 
ulating immune responses, may also be useful in ther- 
apy. Upregulation of immune responses may involve ei- 



ther enhancing an existing immune response or eliciting 
an initial immune response as shown by the following 
examples. For instance, enhancing an immune re- 
sponse through stimulating B lymphocyte antigen func- 
5 tion may be useful in cases of viral infection. In addition, 
system k: viral diseases such as influenza, the common 
cold, and encephalitis might be alleviated by the admin- 
istration of stimulatory form of B lymphocyte antigens 
systemrcally. 

10 [0307] Alternatively, antiviral immune responses may 
be enhanced in an infected patient by removing T cells 
from the patient, costimulating the T cells in vitro with 
viral antigen-pulsed APCs either expressing the pro- 
teins or polypeptides described above or together with 
15 a stimulatory fomi of the protein or polypeptide and re- 
introducing the in vitro primed T cells into the patient. 
The infected cells would now be capable of delivering a 
costimulatory signal to T cells in vivo, thereby activating 
the T cells. 

[0308] In another applicatbn. upregulation or en- 
hancement of antigen function (preferably B lymphocyte 
antigen function) may be useful in the inductk>n of tumor 
immunity. Tumor cells (e.g., sarconna. melanoma, lym- 
phoma, leukemia, neuroblastoma, carcinoma) trans- 
fected with one of the above-described nucleic acids en- 
coding a protein or polypeptkie can be administered to 
a subject to overcome tumor-specific tolerance in the 
subject. If desired, the tumor cell can be transfected to 
express a combination of peptWes. For example, tumor 
cells obtained from a patient can be transfected ex vivo 
with an expression vector directing the expression of a 
peptkje having B7-2-like activity alone, or in conjunctk5n 
with a peptide having B7-1-like activity and/or B7-3-like 
. activity. The transfected tumor cells are returned to the 
patient to result in expression of the peptides on the sur- 
face of the transfected cell. Alternatively, gene therapy 
technk^ues can be used to target a tumor cell for trans- 
fection in vivo. 

[0309] The presence of the protein or polypeptide en- 
coded by the nucleic acids described above having the 
activity of a B lymphocyte antigen(s) on the surface of 
the tumor cell provides the necessary costimulation sig- 
nal to T cells to induce a T cell mediated immune re- 
sponse against the transfected tumor cells. In addition, 
tumor cells which lack or whk;h fail to reexpress suffi- 
cient amounts of MHC class I or MHC class II molecules 
can be transfected with nucleic acids encoding all or a 
portion of (e. g., a cytoplasmic-donnain truncated portion) 
of an MHC class I a chain and ^ mk:roglobulin or an 
MHC class II a chain and an MHC class II p chain to 
thereby express MHC class I or MHC class II proteins 
on the cell surface, respectively Expression of the ap- 
propriate MHC class I or class II molecules in conjunc- 
tion with a peptide having the activity of a B lymphocyte 
antigen (e.g.; B7-1, B7-2. B7-3) induces a T cell medi- 
ated immune response against the transfected tumor 
cell. Optionally, a nucleic acid encoding an antisense 
construct which blocks expression of an MHC class II 
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associated protein, such as the invariant chain, can also 
be cotransfected with a DNA encoding a protein or 
polypeptide having the activity of a B lymphocyte anti- 
gen to prorrrate presentation of tumor associated anti- 
gens and induce tumor specific immunity. Thus, the in- 
duction of a T cell mediated immune response in a hu- 
man subject may be sufficient to overcome tumor-spe- 
cific tolerance in the subject. Alternatively, as described 
in more detail below, nucleic acids encoding these im- 
mune system regulator proteins or polypeptides or nu- 
cleic acids regulating the expression of such proteins or 
polypeptides may be introduced Into appropriate host 
cells to increase or decrease the expression of the pro- 
teins as desired. 

EXAMPLE 24 

Assaying the Expressed Proteins or Polypeptides for 
Hematopoiesis Reoulatina Activity 

[0310] The proteins or polypeptides encoded by the 
nucleic acids described above may also be evaluated 
for their hematopoiesis regulating activity. For example, 
the effect of the proteins or polypeptides on embryonic 
stem cell differentiation may be evaluated. Numerous 
assays for such activity are familiar to those skilled in 
the art, including the assays disclosed in the following 
references: Johansson et aL Cell. Bk>i 15:141-151, 
1995; Keller ef a/., Moi Cell, BioL 13:473-486. 1 993; Mc- 
Clanahan etaL, BtoodSI: 2903-29 15, 1993. 
[0311] The proteins or polypeptides encoded by the 
nucleic acids described above nnay also be evaluated 
for their influence on the lifetime of stem cells and stem 
cell differentiation; Numerous assays for such activity 
are familiar to those skilled in the art. including the as- 
says disclosed In the foltowing references: Freshney, M. 
G. Methylcellutose Colony Forming Assays, in Culture 
of Hematopoietic Cells . R.I. Freshney, et al. Eds. pp. 
265-268, Wiley-Liss, Inc., New York, NY. 1994; Hiraya- 
ma et al.. Proc. Natl. Acad. ScL USA 89:5907-5911, 
1992; McNiece, I.K. and BrkfcJell, R.A. Primitive Hemat- 
opoietk; Cotony Forming Cells with High Proliferative 
Potential, in Culture of Hematopoietic Cells. R.I. Fresh- 
ney. et al. eds. Vol pp. 23-39. Wiley-Liss. Inc., New York, 
NY 1994; Neberf et al.. Experimental Hematology 22: 
353-359, 1994; Ptoemacher, R.E. Cobblestone Area 
Forming Cell Assay, In Culture of Hematopoiette Cells. 
R.I. Freshney, etal. Eds. pp. 1-21. Wiley-Liss. Inc., New 
York. NY 1994; Spooncer, E.. Dexter. M. and Allen. T 
Long Term Bone Man-ow Cultures in the Presence of 
Stromal Cells, in Culture of Hematopoietic Cells . R.I. 
Freshney, etal. Eds. pp. 163-179. Wiley-Liss, Inc., New 
York. NY 1994; and Sutherland, H.J. Long Temi Culture 
Initiating Cell Assay, in Culture of Hematopoietk: Cells . 
R.I. Freshney, etal. Eds. pp. 139-162, Wiley-Liss. Inc., 
New York. NY 1994. 

[0312] Those proteins or polypeptides which exhibit 
hematopoiesis regulatory activity rr^y then be formulat- 



ed as pharmaceuticals and used to treat clinical condi- 
tions in which regulation of hematopoeisis is beneficial. 
For example, a protein or polypeptide of the present In- 
vention may be useful in regulation of hematopoiesis 
5 and, consequently, in the treatment of myeloid or lym- 
phoid cell deficiencies. Even marginal biotogical activity 
in support of colony forming cells or of factor-dependent 
cell lines indicates Involvement in regulating hematopoi- 
esis, e.g. in supporting the growth and proliferation of 
^0 erythroid progenitor cells atone or in comb inat ton with 
other cytokines, thereby Indicating utility, for example, 
in treating various anemias or for use in conjunction with 
irradiation/chemotherapy to stimulate the production of 
erythrokJ precursors and/or erythroid cells; in supporting 
'5 the growth and proliferation of myeloid cells such as 
granutocytes and monocytes/macrophages (i.e., tradi- 
tional CSF activity) useful, for example, in conjunctton 
with chemotherapy to prevent or treat consequent my- 
ekj-suppression; in supporting the growth and prolifer- 
ation of megakaryocytes and consequently of platelets 
thereby allowing prevention or treatment of various 
platelet disorders such as thrombocytopenia, and gen- 
erally for use in place of or complimentary to platelet 
transfusions; and/or in supporting the growth and prolif- 
eration of hematopoietto stem cells which are capable 
of maturing to any and all of the above-mentioned he- 
matopoietic cells and therefore find therapeutic utility in 
vartous stem cell disorders (such as those usually treat- 
ed with transpiantion, including, without limitation, 
aplastic anemia and paroxysmal nocturnal hemoglob- 
inuria), as well as in repopulating the stem cell compart- 
ment post irradiation/chemotherapy, either in-vivo or ex- 
vivo (i.e.. in conjunction with bone marrow transplanta- 
tion or with peripheral progenitor cell transplantatton 
(homologous or heterologous)) as normal cells or ge- 
nettoalty manipulated for gene therapy. Alternatively, as 
described in more detail below, nuciek: acids encoding 
these proteins or potypeptkies or nucleic acids regulat- 
ing the expression of these proteins or polypeptides may 
be introduced into appropriate host cells to increase or 
decrease the expression of the proteins as desired. 

EXAMPLE 25 

Assaying the Expressed Proteins or Polypeptides for 
Regulation of Tissue Growth 

[0313] The proteins or polypepttoes encoded by the 
nucleic acids described above may also be evaluated 
for their effect on tissue growth. Numerous assays for 
such activity are familiar to those skilled in the art, in- 
cluding the assays disclosed in Intemattonal Patent 
Publcation No. WO95/16035, Intemattonal Patent Pub- 
licatton No. WO95/05846 and International Patent Pub- 
licatton No. WO91/07491. 

[0314] Assays for wound healing activity include, 
without limitation, those described In: Winter, Epidemnal 
Wound Healing, pps. 71-112 (Maibach. HI and Rovee, 
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DT, eds.), Year Book Medical Publishers, Inc., Chicago, 
as modified by Eaglstein and Mertz, J. Invest. Dermatol 
71:382-84(1978). 

[0315] Those proteins or polypeptides which are In- 
volved in the regulation of tissue growth may then be 
formulated as pharmaceuticals and used to treat clinical 
conditions in which regulation of tissue growth is bene- 
ficial. For example, a protein or polypeptide may have 
utility in compositions used for bone, cartilage, tendon, 
ligament and/or nerve tissue growth or regeneration, as 
well as for wound healing and tissue repair and replace- 
ment, and in the treatment of bums, incisions and ulcers. 
[031 6] A protein or polypeptide encoded by the nucle- 
ic acids described above which induces cartilage and/ 
or bone growth in circumstances where bone is not nor- 
mally formed, has application In the healing of bone frac- 
tures and cartilage damage or defects in hurtians and 
other animals. Such a preparation employing a protein 
or polypeptide of the invention may have prophylactic 
use in closed as well as open fracture reduction and also 
in the improved fixation of artificial joints. De novo bone 
synthesis induced by an osteogenic agent contributes 
to the repair of congenital, trauma induced, or oncologic 
resection induced craniofacial defects, and also is use- 
ful in cosmetic plastic surgery. 
[0317] A protein or polypeptide of this invention may 
also be used in the treatment of periodontal disease, 
and in other tooth repair processes. Such agents may 
provide an environment to attract bone-forming cells, 
stimulate growth of bone-forming cells or induce differ- 
entiation of progenitors of bone-forming cells. A protein 
of the invention may also be useful in the treatment of 
osteoporosis or osteoarthritis, such as through stimula- 
tion of bone and/or cartilage repair or by blocking inflam- 
mation or processes of tissue diBstruction (collagenase 
activity, osteoclast activity. etc.J mediated by inflamma- 
tory processes. 

[031 8] Another category of tissue regeneration activ- 
ity that may be attributable to the proteins or polypep- 
tides encoded by the nucleic acids described above is 
tendon/ligament formation. A protein or polypeptide en- 
coded by the nucleic acids described above, which in- 
duces tendon/ligament-like tissue or other tissue forma- 
tion in circumstances where such tissue is not normally 
formed, has applrcation in the healing of tendon or liga- 
ment tears, deformities and other tendon or ligament de- 
fects in hunf^s and other animals. Such a preparation 
employing a tendon/ligament-like tissue inducing pro- 
tein may have prophylactic use in preventing damage 
to tendon or ligament tissue, as welt as use in the im- 
proved fixation of tendon or ligament to bone or other 
tissues, and in repairing defects to tendon or ligament 
tissue. De novo tendon/ligament-like tissue formation 
induced by a protein or polypeptide of the present in- 
vention contributes to the repair of tendon or ligaments 
defects of congenital, traumatic or other origin and is 
also useful in cosmetk: plastic surgery for attachment or 
repair of tendons or ligaments. The proteins or polypep- 



tides of the present invention may provide an environ- 
ment to attract tendon- or ligament-forming cells, stim- 
ulate growth of tendon- or ligament-forming cells, induce 
differentiation of progenitors of tendon- or ligament- 

5 forming cells, or induce growth of tendon/ligament cells 
or progenitors ex vivo for return in wVo to effect tissue 
repair. The proteins or polypeptides of the invention may 
also be useful in the treatment of tendinitis, carpal tunnel 
syndrome and other tendon or ligament defects. The 

70 therapeutic compositions may also include an appropri- 
ate matrix and/or sequestering agent as a carrier as is 
well known in the art. 

[031 9] The proteins or polypeptides of the present in- 
vention may also be useful for proliferation of neural 
^5 cells and for regeneration of nerve and brain tissue, i. 
e.. for the treatment of central and peripheral nervous 
system diseases and neuropathies, as well as mechan- 
k^l and traumatic disorders, which involve degenera- 
tion, death or trauma to neural cells or nen/e tissue. 
More specifically, a protein or polypeptide may be used 
in the treatment of diseases of the peripheral nen^ous 
system, such as peripheral nerve injuries, peripheral 
neuropathy and localized neuropathies, and central 
nen/ous system diseases, such as Alzheimer's. Parkin- 
son's disease, Huntington's disease, amyotrophic later- 
al sclerosis, and Shy-Drager syndrome. Further condi- 
tions which may be treated in accordance with the 
present invention include mechank:al and traumatic dis- 
orders, such as spinal cord disorders, head trauma and 
cerebrovascular diseases such as stroke. Peripheral 
neuropathies resulting from chemotherapy or other 
medical therapies may also be treatable using a protein 
or polypeptkJe of the inventk>n. 
[0320] Proteins or polypeptides of the invention may 
also be useful to promote better or faster closure of non- 
heating wounds, including without limitation pressure ul- 
cers, ulcers associated with vascular insufficiency, sur- 
gk:al and traunnatc wounds, and the like. 
[0321] It is expected that a protein or polypeptide of. 
the present inventon may also exhibit activity for gen- 
eration or regeneratksn of other tissues, such as organs 
(including, for example, pancreas, liver, intestine, kid- 
ney, skin, endothelium) muscle (smooth, skeletal or car- 
diac) and vascular (including vascular endothelium) tis- 
sue, or for promoting the growth of cells comprising such 
tissues. Part of the desired effects may be by inhibitbn 
or modulation of fibrotic scarring to allow normal tissue 
to generate. A protein or polypeptide of the invent bn 
may also exhibit angbgenic activity. 
[0322] A protein or polypeptide of the present inven- 
tion may also be useful for gut protectbn or regeneratbn 
and treatment of lung or liver fibrosis, reperfusion injury 
in various tissues, and conditions resulting from system- 
b cytokine damage. 

[0323] A protein or polypeptide of the present inven- 
tion may also be useful for promoting or inhibiting differ- 
entiatbn of tissues described above from precursor tis- 
sues or cells; or for inhibiting the growth of tissues de- 
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scribed above. 

[0324] Aftemativefy, as described in more detail be- 
low, nucleic acids encoding tissue growth regulating ac- 
tivity proteins or polypeptides or nucleic acids regulating 
the expression of such proteins or polypeptides nnay be s 
introduced into appropriate host cells to increase or de- 
crease the expression of the proteins as desired. 

EXAMPLE 26 

10 

Assaying the Expressed Proteins or Polypeptides for 
Regulation of Reproductive Hornrrones 

[0325] The proteins or polypeptides of the present in- 
vention may also be evaluated for their ability to regulate '5 
reproductive hormones, such as follicle stimulating hor- 
mone. Numerous assays for such activity are familiar to 
those skilled in the art, including the assays disclosed 
in the following references: Vale ef a/., Endocrinol. 91: 
562-572, 1 972; Ling et ai, Nature 321 :779-782. 1 986; 20 
Vale etal, /Vafure 321:776-779, 1986; Mason efa/., A/a- 
fure 31 8:659-663, 1985; Forage a/., Proc, Natl. Acad 
ScL USA 83:3091-3095. 1986. Chapter 6.12 in Current 
Protocols in Immunology, J.E. Coligan et al. Eds. 
Greene Publishing Associates and Wiley-lntersciece ; 2$ 
Taub etaL J. Clin. Invest, 95:1370-1376, 1995; Lind ef 
al. AP/W/S 103:140-146, 1995; Muller etal. Eur. J. Im- 
munol. 25:1744-1748; Gruber et aL J. Immunol. 152: 
5860-5867. 1994; Johnston et al., J Immunol. 153: 
1762-1768, 1994. 30 
[0326] Those proteins or polypeptides which exhibit 
activity as reproductive hormones or regulators of cell 
movement may then be formulated as pharmaceuticals 
and used to treat clinical conditions in which regulation 
of reproductive hormones are beneficial. For example, 3S 
a protein or polypeptide may exhibit activin- or inhibin- 
related activities. Inhibins are characterized by their 
ability to inhibit the release of follicle stimulating hor- 
mone (FSH), while activins are characterized by their 
ability to stimulate the release of FSH. Thus, a protein 40 
or polypeptide of the present invention, alone or in het- 
erodimers with a member of the inhibin a family, may be 
useful as a contraceptive based on the ability of inhibins 
to decrease fertility in female mammals and decrease 
spermatogenesis in male mammals. Administration of 4S 
sufficient amounts of other inhibins can induce infertility 
in these mamnr^ls. Alternatively, the protein or polypep- 
tide of the invention, as a homodimer or as a heterodim- 
er with other protein subunits of the inhibin-B group, may 
be useful as a fertility inducing therapeutic, based upon so 
the ability of activin molecules in stimulating FSH re- 
lease from cells of the anterior pituitary. See, for exam- 
ple, United States Patent 4,798.885. A protein or 
polypeptide of the invention may also be useful for ad- 
vancement of the onset of fertility in sexually imrDature 55 
mammals, so as to increase the lifetime reproductive 
performance of domestic animals such as cows, sheep 
and pigs. 



[0327] Alternatively, as described in rmre detail be- 
low, nucleic acids encoding reproductive hormone reg- 
ulating activity proteins or polypeptides or nucleic acids 
regulating the expression of such proteins or polypep- 
tides may be introduced into appropriate host cells to 
increase or decrease the expression of the proteins or 
polypeptides as desired. 

EXAMPLE 27 

Assaying the Expressed Proteins or Polypeptides For 
Chemotactic/Chemokinetic Activity 

[0328] The proteins or polypeptides of the present in- 
vention may also be evaluated for chemotactic/chem- 
okinetic activity. For example, a protein or polypeptide 
of the present invention may have chemotactic or chem- 
okinetic activity (e.g., act as a chemokine) for mamma- 
lian cells, including, for example, monocytes, fibrob- 
lasts, neutrophils, T-cells, mast cells, eosinophils, epi- 
thelial and/or endothelial cells. Chemotactic and chem- 
okinetic proteins or potypeptides can be used to mobi- 
lize or attract a desired cell population to a desired site 
of action. Chemotactk: or chemokinetic proteins or 
polypeptides provide particular advantages in treatment 
of wounds and other traunna to tissues, as well as in 
treatment of kx:altzed infections. For example, attractkxi 
of lymphocytes, monocytes or neutrophils to tumors or 
sites of infection may result in improved immune re- 
sponses against the tumor or infecting agent. 
[0329] A protein or polypeptkJe has chenrwtactic ac- 
tivity for a particular cell populatkxi if it can stimulate, 
directly or indirectly, the directed orientation or move- 
ment of such celt population. Preferably, the protein or 
polypeptide has the ability to directly stimulate directed 
movement of cells. Whether a particular protein or 
polypeptide has chemotactic activity for a population of 
cells can be readily determined by employing such pro- 
tein or polypeptide in any known assay for cell chemo- 
taxis. 

[0330] The activity of a protein or polypeptide of the 
invention may, among other means, be measured by the 
following methods: 

[0331] Assays for chemotactc activity (which will 
klentify proteins or potypeptides that induce or prevent 
chemotaxis) consist of assays that measure the ability 
of a protein or polypeptide to induce the migratbn of 
cells across a membrane as well as the ability of a pro- 
tein or polypeptide to induce the adheskxi of one cell 
population to another cell population. Suitable assays 
for movement and adhesion include, without limitation, 
those described in: Current Protocols in Immunology, 
Ed by J.E. Coligan. A.M. Kruisbeek. D.H. Margulies, E. 
M. Shevach. W. Strober. Pub. Greene Publishing Asso- 
ciates and Wiley-lnterscience.Chapter 6,12: 
6.12.1-6,12.28; Taub et al. J. Clin. Invest 95: 
1370-1376, 1995; Lind ef a/. APM/S 103: 140-1 46. 1995; 
Mueller et al., Eur. J. Immunol. 25:1744-1748; Gruber 
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atal. J. Immunol. 152:5860-5867, 1994; Johnston etaf. 
J. Immunol., 153:1762-1768, 1994. 

EXAMPLE 28 

Assaying the Expressed Proteins or Polypeptides tor 
Regulation of Blood Clotting 

[0332] The proteins or polypeptides of the present in- 
vention may also be evaluated for their effects on blood 
clotting. Numerous assays for such activity are familiar 
to those skilled in the art, including the assays disclosed 
in the following references: Linet etal., J, Clin. Pharma- 
col. 26:131-140, 1986; Burdick etaL, Thrombosis Res. 
45:413-419, 1987; Humphrey er a/.. Fibrinolysis SI ^ -79 
(1991); Schaub. Prostaglandins 35:467-474, 1988. 
[0333] Those proteins or polypeptides which are in- 
volved in the regulation of blood clotting may then be 
formulated as pharmaceuticals and used to treat clinical 
conditions in which regulation of blood clotting is bene- 
ficial. For example, a protein or polypeptide of the inven- 
tion may also exhibit hemostatic or thrombolytic activity. 
As a result, such a protein or polypeptide is expected to 
be useful in treatment of various coagulations disorders 
(including hereditary disorders, such as henrwphilias) or 
to enhance coagulation and other hemostatic events in 
treating wounds resulting from trauma, surgery or other 
causes. A protein or polypeptide of the invention may 
also be useful for dissolving or inhibiting formation of 
thromboses and for treatment and prevention of condi- 
tions resulting therefrom (such as infarction of cardiac 
and central nervous system vessels (e.g., stroke)). Al- 
tematively. as described in more detail below, nucleic 
acids encoding blood clotting activity proteins or 
polypeptides or nucleic acids regulating the expression 
of such proteins or polypeptides may be introduced Into 
appropriate host cells to increase or decrease the ex- 
pression of the proteins or polypeptkjes as desired. 

EXAMPLE 29 

Assaying the Expressed Proteins or Polypeptides for 
Involvement in Receptor/Ligand Interactions 

[0334] The proteins or polypeptides of the present in- 
vention may also be evaluated for their involvement in 
receptor/ligand interactions. Numerous assays for such 
involvement are familiar to those skilled in the art. in- 
cluding the assays disclosed in the folbwing references: 
Chapter 7. 7.28.1-7.28.22) in Currant Protocols in Im- 
munology, J.E. Coligan et al, Eds. Greene Publishing 
Associates and Wiley-lnterscience; Takal et aL, Proc. 
NaU. Acad. Sci. aS>^ 84:6864-6868. 1987; Biererefa/., 
J. Exp. Med. 168:1145-1156. 1988; Rosenstein etal., J. 
Exp. Med. 169:149-160, 1989; Stoltenborg etal., J. Im- 
munol. Methods 175:59-68, 1994; Stitt etal., Ceil 00: 
661-670, 1995; Gyurls etal., Ce// 75:791 -803. 1993. 
[0335] For example, the proteins or polypeptides of 



the present invention may also demonstrate activity as 
receptors, receptor ligands or inhibitors or agonists of 
receptor/ligand interactions. Examples of such recep- 
tors and ligands include, without limitation, cytokine re- 

5 ceptors and their ligands, receptor kinases and their lig- 
ands, receptor phosphatases and their ligands, recep- 
tors involved in cell-cell interactions and their ligands 
(including without limitatton. cellular adhesion mole- 
cules (such as selectins, integrins and their ligands) and 

^0 receptor/ligand pairs involved in antigen presentation, 
antigen recognition and development of cellular and hu- 
moral immune responses). Receptors and ligands are 
also, useful for screening of potential peptide or small 
molecule inhibitors of the relevant receptor/ligand inter- 

^5 action. A protein or polypeptide of the present inventbn 
(Including, without limitation, fragments of receptors and 
ligands) maybe useful as inhibitors of receptor/ligand in- 
teractions. Alternatively, as described In more detail be- 
bw, nucleic acids encoding proteins or polypeptides in- 

20 volved in receptor/ligand interactions or nuciek: acids 
regulating the expression of such proteins or polypep- 
tides may be introduced into appropriate host cells to 
increase or decrease the expression of the proteins or 
polypeptides as desired. 

25 

EXAMPLE 30 

Assaying the Proteins or Polypeptides for /Vnti- 
Inflammatory Activltv 

30 

[0336] The proteins or polypeptkies of the present in- 
ventkwi may also be evaluated for anti-inflammatory ac- 
tivity. The anti-inflammatory activity may be achieved by 
provkJing a stimulus to cells involved in the inflammatory 

35 response, by inhibiting or promoting cell-cell interac- 
tions (such as, for example, cell adheston), by inhibiting 
or promoting chemotaxis of cells involved In the inflam- 
matory process, inhibiting or promoting cell extravasa- 
tion, or by stimulating or suppressing productbn of other 

40 factors which more directly inhibit or promote an Inflam- 
matory response. Proteins or polypeptides exhibiting 
such activities can be used to treat inflammatory condi- 
tions including chronic or acute conditions, including 
without limrtatbn inflammation associated with infectbn 

45 (such as septb shock, sepsis or systemic inflammatory 
response syndrome), ischemlareperfusioninury, endo- 
toxin lethality, arthritis, complement-mediated hypera- 
cute rejectbn, nephritis, cytokine- or chemokine-in- 
duced lung injury, inflammatory bowel disease. Crohn's 

50 disease or resulting from over productbn of cytokines 
such as TNF or IL-1 . Proteins or polypeptides of the in- 
ventbn may also be useful to treat anaphylaxis and hy- 
persensitivity to an antigenb substance or material. Al- 
ternatively, as described in more detail below, nucleic 

55 acbs encoding anti-Inflammatory activity proteins or 
polypeptides or nucleic ackls regulating the expressbn 
of such proteins or polypeptides may be introduced into 
appropriate host cells to increase or decrease the ex- 
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pression of the proteins or polypeptides as desired. 
EXAMPLE 31 

Assaying the Expressed Proteins or Polypeptides for S 
Tumor Inhibition Activity 

[0337] The proteins or polypeptides of the present in- 
vention may also be evaluated for tumor inhibition ac- 
tivity In addition to the activities described above for im- io 
munological treatment or prevention of tumors, a protein 
or polypeptide of the invention may exhibit other anti- 
tumor activities. A protein or polypeptide may inhibit tu- 
mor growth directly or indirectly (such as, for example, 
via ADCC). A protein or polypeptide may exhibit its tu- ?5 
mor inhibitory activity by acting on tumor tissue or tumor 
precursor tissue, by inhibiting formation of tissues nec- 
essary to support tumor growth (such as, for example, 
by inhibiting angiogenesis), by causing productbn of 
other factors, agents or cell types which inhibit tunx»r 20 
growth, or by suppressing, eliminating or inhibiting fac- 
tors, agents or cell types which promote tumor growth. . 
Alternatively, as described in more detail below, nucleic 
acids encoding proteins or polypeptides with tumor in- 
hibition activity or nucleic acids regulating the expres- 2S 
sion of such proteins or polypeptides may be introduced 
into appropriate host cells to increase or decrease the 
expression of the proteins or polypeptides as desired. 
[0338] A protein or polypeptide of the invention may 
also exhibit one or more of the following additional ac- 30 
tivities or effects: inhibiting the growth, infection or func- 
tion of, or killing, infectious agents, including, without 
limitation, bacteria, viruses, fungi and other parasites; 
effecting (suppressing or enhancing) bodily character- 
istics, including, without limitation, height, weight, hair 35 
color, eye color, skin, fat to lean ratio or other tissue pig- 
mentation, or organ or body part size or shape (such as, 
for example, breast augmentation or diminution, change 
in bone form or shape); effecting biorhythms orcircadian 
cycles or rhythms; effecting the fertility of male or female 
subjects; effecting the metabolism, catabolism, anabo- 
lism, processing, utilization, storage or elimination of di- 
etary fat. lipid, protein, carbohydrate, vitamins, minerals, 
cofactors or other nutritional factors or component(s); 
effecting behavwral characteristk:s, including, without 45 
limitation, appetite, libido, stress, cognition (including 
cognitive disorders), depression (including depressive 
disorders) and violent behaviors; providing analgesic ef- 
fects or other pain reducing effects; pronrwting differen- 
tiation and growth of embryonic stem cells in lineages so 
other than hematopoietic lineages; hormonal or endo- 
crine activity; in the case of enzymes, correcting defi- 
ciencies of the enzyme and treating deficiency-related 
diseases; treatment of hyperproliferative disorders 
(such as, for example, psoriasis); immunoglobulin-like 55 
activity (such as, for example, the ability to bind antigens 
or complement); and the ability to act as an antigen in 
a vaccine composition to raise an immune response 



against such protein or another material or entity which 
is cross-reactive with such protein. Alternatively, as de- 
scribed in more detail bebw, nucleic acids encoding pro- 
teins or polypeptides involved in any of the above men- 
tioned activities or nucleic acids regulating the expres- 
sion of such proteins may be introduced into appropriate 
host cells to increase or decrease the expression of the 
proteins or polypeptides as desired. 

EXAMPLE 32 

Identification of Proteins or Polypeptides which Interact 
with Proteins or Polypeptides of the Present Invention 

[0339] Proteins or polypeptides which interact with 
the proteins or polypeptides of the present invention, 
such as receptor proteins, may be identified using two 
hybrkJ systems such as the Matchmaker Two Hybrid 
System 2 (Catalog No. K1 604-1, Clontech). As de- 
scribed in the manual accompanying the kit, nucleic ac- 
ids encoding the proteins or polypeptides of the present 
invention, are inserted into an expressbn vector such 
that they are in frame with DN A encoding the DN A bind- 
ing domain of the yeast transcriptional activator GAL4. 
cDNAs in a cDNA library which encode proteins or 
polypeptides which might interact with the proteins or 
polypeptides of the present Invention are inserted into 
a second expression vector such that they are in frame 
with DNA encoding the activation domain of G AL4. The 
two expression plasmids are transformed into yeast and 
the yeast are plated on selectbn medium which selects 
for expression of selectable mariners on each of the ex- 
pression vectors as well as GALA dependent expres- 
skxi of the HI S3 gene. Transformants capable of grow- 
ing on medium lacking histidine are screened for GAL4 
dependent lacZ expressk)n. Those cells which are pos- 
itive in both the histidine selection and the tacZ assay 
contain plasmids encoding proteins or polypeptides 
which interact with the proteins or polypeptides of the 
present invention. 

[0340] Alternatively, the system described in Lustig et 
ai. Methods in Enzymo!ogy283: 83-99 (1 997), may be 
used for identifying molecules which interact with the 
proteins or polypeptides of the present inventbn. In 
such systems, in vitro transcription reactbns are per- 
formed on a pool of vectors containing nucleic acb in- 
serts which encode the proteins or polypeptides of the 
present invention. The nucleb acid inserts are cbned 
downstream of a promoter whbh drives' in vitro tran- 
scription. The resulting pools of mRNAs are introduced 
into Xenopus laevis oocytes. The oocytes are then as- 
sayed for a desired activity. 

[0341] Alternatively, the pooled in vitro transcriptbn 
products produced as described above may be translat- 
ed in vitro. The pooled in vitro translation products can 
be assayed for a desired activity or for interaction with 
a known protein or polypeptide. 
[0342] Proteins, polypeptides or other molecules in- 
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teracting with proteins or polypeptides of the present in- 
vention can be found by a variety of additional tech- 
niques. In one method, affinity columns containing the 
protein or polypeptide of the present invention can be 
constructed. In some versions, of this method the affinity s 
column contains chimeric proteins in which the protein 
or polypeptide of the present invention is fused to glu- 
tathbne S-transf erase. A mixture of cellular proteins or 
pool of expressed proteins as described above and is 
applied to the affinity column. Molecules interacting with io 
the protein or polypeptide attached to the column can 
then be isolated and analyzed on 2-D electrophoresis 
gel as described in Ramunsen ©fa/. Electrophoresis, 18, 
588-598 (1 997). Alternatively the molecules retained on 
the affinity column can be purified by electrophoresis is 
based methods and sequenced. The same method can 
be used to Isolate antibodies, to screen phage display 
products, or to screen phage display human antibodies. 
[0343] Molecules interacting with the proteins or 
polypeptides of the present invention can also be 20 
screened by using an Optical Biosensor as described in 
Edwards & Leatherbarrow, Analyticai Biochemistry, 
246, 1-6 (1997). The main advantage of the method is 
that it allows the determination of the association rate 
between the protein or polypeptide and other interacting 2S 
molecules. Thus, it is possible to specifically select in- 
teracting molecules with a high or low association rate. 
Typically a target molecule is linked to the sensor sur- 
face (through a carboxymethi dextran matrix) and a 
sample of test molecules is placed in contact with the 30 
target molecules. The binding of a test molecule to the 
target ntolecule causes a change in the refractive index 
and/ or thickness. This change is detected by the Bio- 
sensor provkjed it occurs in the evanescent field (which 
extend a few hundred nanometers from the sensor sur- 35 
face). In these screening assays, the target molecule 
can be one of the proteins or polypeptides of the present 
invention and the test sample can be a collection of pro- 
teins, polypeptk3es or other molecules extracted from 
tissues or cells, a pool of expressed proteins, combina- 40 
torial peptide and/ or chemrcal libraries, or phage dis- 
played peptides. The tissues or ceils from which the test 
molecules are extracted can originate from any species. 
[0344] In other methods, a target protein or polypep- 
tide is immobilized and the. test poputatkxi is a collection ^ 
of unique proteins or polypeptides of the present inven- 
tion. 

[0345] To study the interactkjn of the proteins or 
polypeptkJes of the present invention with drugs, the 
mwrodialysis coupled to HPLC method described by so 
Wang etal., Chromatogmphia, 44, 205-208(1 997) orthe 
affinity capillary electrophoresis method described by 
Busch ©fa/., J. Chromatogr. 777:31 1 -328 (1 997) can be 
used. 

[0346] The system described in U.S. Patent No. ss 
5.654. 1 50 may also be used to identify molecules which 
interact with the proteins or polypeptkies of the present 
invention. In this system, pools of nucleic acids encod- 



ing the proteins or polypeptides of the present invention 
are transcribed and translated /r? vitro and the reactbn 
products are assayed for interaction with a known 
polypeptide or antibody. 

[0347] It will be appreciated by those skilled in the art 
that the proteins or polypeptides of the present inventbn 
may be assayed for numerous activities in addition to 
those specifically enumerated above. For example, the 
expressed proteins or poiypeptkjes may be evaluated 
for applications involving control and regulation of in- 
flammation, tumor proliferation or metastasis, infection, 
or other clinical conditions. In addition, the proteins or 
polypeptides may be useful as nutritkDnal agents or cos- 
metic agents. 

[0348] The proteins or polypeptides of the present in- 
ventbn may be used to generate antibodies capable of 
specifically binding to the proteins or polypeptides of the 
present invention. The antibodies may be monoclonal 
antibodies or polyclonal antibodies. As used herein, 'an- 
tibody' refers to a polypeptide or group of polypeptkies 
which are comprised of at least one binding domain, 
where a binding domain is formed from the folding of 
variable domains of an antibody molecule to form three- 
dimenskjnal binding spaces with an internal surface 
shape and charge distributton complementary to the 
features of an antigenic determinant of an antigen., 
which allows an immunokDgical reaction with the anti- 
gen. Antibodies include recombinant proteins compris- 
ing the binding domains, as wells as fragments, includ- 
ing Fab, Fab', F(ab)2. and F{ab')2 fragments. 
[0349] As used herein, an 'antigenic determinant" is 
the portion of an antigen nriolecule, that determines the 
specificity of the antigen-antibody reaction. An 'epitope' 
refers to an antigenk: determinant of a polypeptkJe. An 
epitope can comprise as few as 3 amino acids in a spa- 
tial conformation which is unique to the epitope. Gener- 
ally an epitope consists of at least 6 such amino acids, 
and more usually at least 8-10 such amino acids. Meth- 
ods for determining the amino acids which make up an 
epitope include x-ray crystallography. 2-dimensional nu- 
clear magnetic resonance, and epitope mapping e.g. 
the Pepscan method described by H. Mario Geysen et 
al. 1984. Proc. Natl. Acad. Sci. U.S.A. 81:3998-4002; 
PCT Publfcation No. WO 84A)3564: and PCT Publica- 
tion No. WO 84/03506- 

[0350] In some embodiments, the antibodies may be 
capable of specifically binding to a protein or polypep- 
tide encoded by EST-related nuciek; acids, fragments 
of EST-related nucleic acids, positional segments of 
EST-related nucleic ackJs or fragments of positional 
segments of EST-related nucleic acids. In some embod- 
iments, the antibody may be capable of binding an an- 
tigenic determinant or an epitope in a protein or polypep- 
tide encoded by EST-related nuciek: acids, fragments 
of EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids. 
[0351] In other embodiments, the antibodies may be 
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capable of specifically binding to an EST-related 
polypeptide, fragment of an EST-related polypeptide, 
positional segment of an EST-related polypeptide or 
fragment of a positional segment of an EST-related 
polypeptide. In some embodiments, the antibody may 
be capable of binding an antigenic determinant or an 
epitope in an EST-related polypeptide, fragment of an 
EST-related polypeptide, positional segment of an EST- 
related polypeptide or fragment of a positional segment 
of an EST-related polypeptide. 
[0352] I n the case of secreted proteins, the antibodies 
may be capable of binding a full-length protein encoded 
by a nucleic acid of the present invention, a mature pro- 
tein (Le. the protein generated by cleavage of the signal 
peptide) encoded by a nucleic acid of the present inven- 
tion, or a signal peptide encoded by a nucleic acid of the 
present invention. 

EXAMPLE 33 

Production of an Antibody to a Human Polypeptide or 
Protein 

[0353] The above described EST-retated nucleic ac- 
ids, fragments of EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids or nu- 
cleic acids encoding EST-related polypeptides, frag- 
ments of EST-related polypeptides, positional segments 
of EST-related polypeptides or fragments of positional 
segments of EST-related polypeptides are operably 
linked to promoters and introduced into cells as de- 
scribed above. 

[0354] In the case of secreted proteins, nucleic acids 
ericoding the full protein (i.e. the mature protein and the 
signal peptide), nucleic acids encoding the mature pro- 
tein (i.e. the protein generated by cleavage of the signal 
peptide), or nucleic acids encoding the signal peptide 
are operably linked to promoters and introduced into 
cells as described above. 

[0355] The encoded proteins or polypeptides are then 
substantially purified or isolated as described above. 
The concentration of protein in the final preparation is 
adjusted, for example, by concentratran on an Amicon 
filter device, to the level of a few fig/ml. Monoclonal or 
polyclonal antibody to the protein or polypeptide can 
then be prepared as foltows: 

1. Monoclonal Antibody Production by Hvbridoma 
Fusion 

[0356] Monoclonal antibody to epitopes of any of the 
proteins or polypeptides identified and isolated as de- 
scribed can be prepared from murine hybridomas ac- 
cording to the classical method of Kohler, and Milstein, 
Nature 256:495 (1975) or derivative methods thereof. 
Briefly, a mouse is repetitively inoculated with a few mi- 
crograms of the selected protein or peptides derived 



therefrom over a period of a few weeks. The mouse is 
then sacrificed, and the antibody producing cells of the 
spleen isolated. The spleen cells are fused by means of 
polyethylene glycol with mouse myeloma cells, and the 

5 excess unfused cells destroyed by growth of the system 
on selective media comprising aminopterin (HAT me- 
dia). The successfully fused cells are diluted and aliq- 
uots of the dilution placed in wells of a microtiter plate 
where growth of the culture is continued. Antibody-pro- 

^0 ducing clones are identified by detection of antibody in 
the supernatant fluid of the wells by immunoassay pro- 
cedures, such as Elisa, as originally described by 
Engvall, Meth. EnzymoL 70:419 (1980). Selected posi- 
tive clones can be expanded and their monoclonal an- 

'5 tibody product harvested for use. Detailed procedures 
for monoclonal antibody production are described in 
Davis, L. Gt aL in Basic Methods in Molecular Biology 
Elsevier, New Yori<, Section 21-2. 

2. Polyckyial Antibody Production by Immunization 

[0357] Polyclonal antiserum containing antibodies to 
heterogenous epitopes of a single protein or polypeptide 
can be prepared by immunizing suitable animals with 

2S the expressed protein or peptides derived therefrom, 
which can be unnrxxjified or modified to enhance immu- 
nogenicity. Effective polyclonal antibody production is 
affected by many factors related both to the antigen and 
the host species. For example, small rTK>lecules tend to 

30 be less immunogenic than others and may require the 
use of carriers and adjuvant. Also, host animals re- 
sponse vary depending on site of inoculations and dos- 
es, with both inadequate or excessive doses of antigen 
resulting in low titer antisera. Smalt doses (ng level) of 

35 antigen adrrifnistered at multiple intradeninal sites ap- 
pears to be nnost reliable. An effective immunization pro- 
tocol for rabbits can be found in Vaitukaitis. etaU. Clir\. 
EndochnoL Metab. 33:988-991 (1971). 
[0358] Booster injectbns can be given at regular in- 

40 tervals, and antiserum harvested when antibody titer 
thereof, as determined semi -quantitatively, for example, 
by double immunodiffusion in agar against known con- 
centratkxis of the antigen, begins to fall. See, for exam- 
ple, Ouchterlony, etal.. Chap. 19 in: Hartdbook of Ex- 

45 perimentai Immunology D. Wier (ed) Blackwell (1 973). 
Plateau concentration of antibody is usually in the range 
of 0.1 to 0.2 mg/ml of serum (about 12 nM). Affinity of 
the antisera for the antigen is determined by preparing 
competitive binding curves, as described, for example, 

so by Fisher, D.. Chap. 42 in: Manual of Clinical Immunol- 
ogy 2d.Ed. (Rose and Friedman, Eds.) Amer. Soc. For 
Microbol., Washington, D.C. (1980). 
[0359] Antibody preparations prepared according to 
either of the above protocols are useful in a variety of 

55 contexts. In particular, the antibodies may be used in 
immunoaffinity chronnatography techniques such as 
those described below to facilitate large scale isolation, 
purificatran, or enrichment of the proteins or polypep- 
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tides encoded by EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids or for 
the isolation, purification or enrichment of EST-retaled 
polypeptides, fragments of EST-related polypeptides, 
positional segments of EST-related polypeptides or 
fragments of positional segments of EST-related 
polypeptides. 

[0360] In the case of secreted proteins, the antibodies 
may be used for the isolation, purification, or enrichment 
of the full protein (i.e. the mature protein and the signal 
peptide), the mature protein (i.e. the protein generated 
by cleavage of the signal peptide), or the signal peptide 
are operably linked to pronroters and introduced into 
cells as described above. 

[0361] Additionally, the antibodies noay be used in im- 
munoafflnity chromatography techniques such as those 
described below to isolate, purify, or enrich polypeptides 
which have been linked to the proteins or polypeptides 
encoded by EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids or to iso- 
late, purify, on enrich EST-related polypeptides, frag- 
ments of EST-related polypeptides, positional segments 
of EST-related polypeptides or fragments of positional 
segments of EST-related polypeptdes. 
[0362] The antibodies may also be used to determine 
the cellular localization of polypeptides encoded by the 
proteins or polypeptides encoded by EST-related nucle- 
ic ackis, positional segments of EST-related nucleic ac- 
ids or fragments of positional segments of EST-related 
nucleic acids or the cellular localization of EST-related 
polypeptkles, fragments of EST-related polypeptkles, 
positional segments of EST-related polypeptides or 
fragments of positional segments of EST-related 
polypeptides. 

[0363] In additbn, the antibodies may also be used to 
determine the cellular localization of polypeptkjes which 
have been linked to the proteins or polypeptkles encod- 
ed by EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positkxial 
segments of EST-related nucleic ackls or polypeptides 
which have been linked EST-related polypeptides, frag- 
ments of EST-related polypeptides, positional segments 
of EST-related polypeptides or fragments of positional 
segments of EST-related polypeptkles . 
[0364] The antibodies may also be used in quantita- 
tive immunoassays whrch determine concentrations of 
antigen-bearing substances in bioksgical samples; they 
may also used semi-quantitatlvely or qualitatively to 
identify the presence of antigen in a bblogical sample 
or to identify the type of tissue present in a biological 
sample. The antibodies may also be used in therapeutic 
compositions for killing cells expressing the protein or 
reducing the levels of the protein in the body. 



V. Use of S'ESTs and Consensus Contlgeted 5' ESTs 
or Sequences Obtainable Therefrom or Portions 
Thereof as Reagents 

5 [0365] The EST-related nuciek: acids, positbnal seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may be 
used as reagents in isolation procedures, diagnostic as- 
says, and forensic procedures. For example, sequenc- 
10 es from the EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids, may be 
detectably labeled and used as probes to isolate other 
sequences capable of hybridizing to them. In addition, 
^5 the he EST-related nucleic acids, positional segments 
of EST-related nucleic ackJs or fragments of positional 
segments of EST-related nuciek: acids may be used to 
design PCR primers to be used in isolation, diagnostic, 
or forensic procedures. 

1. Use of EST-related nuciek: acids, positional 
segments of EST-related nucleic ackjs or fragments of 
positional segments of EST-related nucleic acids in 
isolatk>n. diagnostk; and forensic procedures 

EXAMPLE 34 

Preparation of PCR Primers and Amplfficaton of DNA 

[0366] The EST-related nuciek: acids, positbnal seg- 
ments of EST-related nuciek; ackis or fragments of po- 
sitional segments of EST-related nucleic acids may be 
used to prepare PCR primers for a variety of applica- 
tions, including isolatk>n procedures for cloning nucleic 
ackis capable of hybridizing to such sequences, diag- 
nostk: techniques and forensic techniques. In some em- 
bodiments, the PCR primers at least 10. 15, 18, 20, 23, 
25. 28. 30, 40. or 50 nucleotides in length. In some em- 
bodiments, the PCR primers nnay be more than 30 bas- 
es in length. It is preferred that the primer pairs have 
approximately the same G/C ratio, so that melting tem- 
peratures are approximately the same. A variety of PCR 
technk:|ues are familiar to those skilled in the art. For a 
review of PCR technology, see Molecular Ctoning to Ge- 
netk: Engineering White, B.A. Ed. in Methods in Molec- 
ular Biology S7: Humana Press, Totowa 1997. In each 
of these PCR procedures, PCR primers on either side 
of the nuciek: acid sequences to be amplified are added 
to a suitably prepared nucleic ackj sample abng with 
dNTPs and a thermostable polymerase such as Taq 
polymerase, Ru polymerase, or Vent polymerase. The 
nucleic acid in the sample is denatured and the PCR 
primers are specifically hybridized to complementary 
nucleic acid sequences in the sample. The hybridized 
primers are extended. Thereafter, another cycle of de- 
naturation. hybridizatk)n, and extension is initiated. The 
cycles are repeated multiple times to produce an ampli- 
fied fragment containing the nuciek: ac\6 sequence be- 
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tween the primer sites. 
EXAMPLE 35 

Use of the EST- related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids as 
probes 

[0367] Probes derived from EST-related nucleic ac- 
ids, positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids may be labeled with detectable labels familiar to 
those skilled in the art, including radioisotopes and non- 
radioactive labels, to provide a detectable probe. The 
detectable probe may be single stranded or double 
stranded and may be made using techniques known in 
the art, including in vitro transcription, nek translation, 
or kinase reactions. A nucleic acid sample containing a 
sequence capable of hybridizing to the labeled probe is 
contacted with the labeled probe. If the nuciek: acid in 
the sample is double stranded, it may be denatured prbr 
to contacting the probe. In some applications, the nu- 
clek; acid sample may be immobilized on a surface such 
as a nrtrocellulose or nylon membrane. The nuciek: acid 
sample may comprise nucleic acids obtained from a va- 
riety of sources, including genomic DNA, cDNA librar- 
ies, RNA, or tissue samples. 

[0368] Procedures used to delect the presence of nu- 
cleic acids capable of hybrkiizing to the detectable 
probe include well known techniques such as Southern 
blotting, Northern btotting, dot blotting, colony hybridi- 
zation, and plaque hybridizatk>n. In some applications, 
the nucleic acid capable of hybridizing to the labeled 
probe may be ctoned into vectors such as expression 
vectors, sequencing vectors, or in vitro transcription 
vectors to facilitate the characterization and expression 
of the hybridizing nucleic acids in the sample. For ex- 
ample, such technk^ues may be used to isolate and 
clone sequences in a genomic library or cDNA library 
which are capable of hybridizing to the detectable probe 
as described in Example 1 8 above. 
[0369] PCR primers made as described in Example 
34 above may be used in forensic analyses, such as the 
DNA fingerprinting technk^ues described in Examples 
36-40 below. Such analyses may utilize detectable 
probes or primers based on the sequences of the EST- 
related nucleic acids, positbnal segments of EST-relat- 
ed nuciek; acids or fragments of positkxial segments of 
EST-related nucleic acids. 

EXAMPLE 36 

Forensic Matching by DNA Sequencing 

[0370] In one exemplary method, DNA samples are 
isolated from forensic specimens of, for example, hair, 
semen, blood or skin cells by conventional methods. A 



panel of PCR primers based on a number of the EST- 
related nucleic acids, positional segments of EST-relat- 
ed nucleic acids or fragments of positional segments of 
EST-related nucleic ackis is then utilized in accordance 
s with Example 34 to amplify DNA of approximately 
100-200 bases in length from the forensic specimen. 
Corresponding sequences are obtained from a test sub- 
ject. Each of these identificatkxi DNAs is then se- 
quenced using standard technques, and a simple da- 
10 tabase comparison determines the differences, If any, 
between the sequences from the subject and those from 
the sample. Statistically significant differences between 
the suspect's DNA sequences and those from the sam- 
ple conclusively prove a lack of identity. This lack of 
IS klentity can be proven, for example, with only one se- 
quence. Identity, on the other hand, shouW be demon- 
strated with a large number of sequences, all matching. 
Preferably, a minimum of 50 statistk^alfy identical se- 
quences of 100 bases in length are used to prove kien- 
tity between the suspect and the sample. 

EXAMPLE 37 

Positive Identificatkpn by DNA Sequencing 

[0371] The technique outlined in the prevk>us exam- 
ple may also be used on a larger scale to provide a 
unique fingerprint-type identiflcaton of any indivkJual. In 
this technique, primers are prepared from a large 
number of EST-related nuclec ackjs, positkxial seg- 
ments of EST-related nucleic ackis or fragments of po- 
sitronal segments of EST-related nucleic acids. Prefer- 
ably, 20 to 50 different primers are used. These primers 
are used to obtain a corresponding number of PCR-gen- 
erated DNA segments from the individual in question in 
accordance with Example 34. Each of these DNA seg- 
ments is sequenced, using the methods set forth in Ex- 
ample 36. The database of sequences generated 
through this procedure uniquely kjentifies the individual 
from whom the sequences were obtained. The same 
panel of primers may then be used at any later time to 
absolutely correlate tissue or other biobgical specimen 
with that individual. 

EXAMPLE 38 

Southern Blot Forensic Identification 

[0372] The procedure of Example 37 is repeated to 
obtain a panel of at least 10 amplified sequences from 
an individual and a specimen. Preferably, the panel con- 
tains at least 50 amplified sequences. More preferably, 
the panel contains 100 amplified sequences. In some 
embodiments, the panel contains 200 amplified se- 
quences. This PGR-generated DNA is then digested 
with one or a combination of, preferably, four base spe- 
cific restriction enzymes. Such enzymes are commer- 
cially available and known to those of skill in the art. After 
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digestion, the resultant gene fragments are size sepa- 
rated in multiple duplicate wells on an agarose gel and 
transferred to nitrocellulose using Southern blotting 
techniques well known to those with skill in the art. For 
a review of Southern btotting see Davis et ai (Basic 
Methods in Molecular Biology, 1986. Elsevier Press, pp 
62-65). 

[0373] A panel of probes based on the sequences of 
the EST-related nucleic acids, positional segments of 
EST- related nucleic acids or fragments of positional 
segments of EST-related nucleic acids are radioactively 
or cobrimetrically labeled using methods known in the 
art, such as nick translation or end labeling, and hybrid- 
ized to the Southem blot using techniques known in the 
art (Davis etal., supra). Preferably, the probe is at least 
10. 12. 15. 18. 20, 25, 28, 30, 35. 40, 50, 75, 100, 150, 
200, 300, 400 or 500 nucleotides in length. Preferably, 
the probes are at least 10. 12, 15, 18, 20. 25, 28, 30, 
35. 40, 50, 75, 100, 150, 200. 300, 400 or 500 nucle- 
otides in length. In some embodiments, the probes are 
oligonucleotides which are 40 nucleotides in length or 
less. 

[0374] Preferably, at least 5 to 10 of these labeled 
probes are used, and more preferably at least about 20 
or 30 are used to provide a unique pattem. The resultant 
bands appearing from the hybridization of a large sam- 
ple of EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids will be a unique 
identifier. Since the restriction enzyme cleavage will be 
different for every individual, the band pattern on the 
Southern blot will also be unique. Increasing the number 
of probes will provide a statistically higher level of con- 
fidence in the identificatk)n since there will be an in- 
creased number of sets of bands used for identiftcation. 

EXAMPLE 39 

Dot Blot Identification Procedure 

[0375] Another technique for identifying individuals 
using the EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids dis- 
closed herein utilizes a dot btot hybridization technique. 
[0376] Genomic DNA is isolated from nuclei of subject 
to be identified. Probes are prepared that correspond to 
at least 10. preferably 50 sequences from the EST-re- 
lated nucleic acids, positional segments of EST-related 
nucleic acids or fragments of posittonal segments of 
EST-related nucleic acids. The probes are used to hy- 
bridize to the genomic DNA through conditions known 
to those in the art. The oligonucleotides are end labeled 
with p32 using polynucleotide kinase (Pharmacia). Dot 
Blots are created by spotting the genomk; DNA onto ni- 
trocellulose or the like using a vacuum dot btot manifold 
(BioRad, Richmond California). The nitrocellulose filter 
containing the genomic sequences is baked or UV 



linked to the fitter, prehybridized and hybridized with la- 
beled probe using techniques known in the art (Davis et 
a/., supray The 32p labeled DNA fragments are sequen- 
tially hybridized with successively stringent conditions 

5 to detect minimal differences between the 30 bp se- 
quence and the DNA. Tetramethylammonium chloride 
is useful for identifying clones containing small numbers 
of nucleotide mismatches (Wood etaL, Proc. Natl Acad, 
Set, aSyA 82(6): 1585-1 588 (1985)). A unk^ue pattem of 

10 dots distinguishes one individual from another individu- 
al. 

[0377] EST-related nucleic acids, posittonal seg- 
ments of EST-related nucleic ackJs or fragments of po- 
sitional segments of EST-related nucleic acids can be 

15 used as probes in the following alternative fingerprinting 
technkjue. In some embodiments, the probes are oligo- 
nucleotides which are 40 nucleotides in length or less. 
[0378] Preferably, a plurality, of probes having se- 
quences from different EST-related nucleic acids, posi- 

20 tional segments of EST-related nucleic acids or frag- 
ments of positional segments of EST-related nucleic ac- 
kJs are used in the alternative fingerprinting technique. 
Example 40 below provkjes a representative alternative 
fingerprinting procedure in which the probes are derived 

25 from EST-related nuclei acids, positional segments of 
EST-related nucleic ackjs or fragments of positional 
segments of EST-related nucleic acids. 

EXAMPLE 40 

30 

Alternative "Fingerprint* Identificatton Technique 

[0379] ONgonucleotkies are prepared from a large 
number, e.g. 50, 100, or 200, EST-related huciek; acids, 

35 positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
ackte using commercially available oligonucleotide 
services such as Genset. Paris, France. Preferably, the 
oligonucleotides are at least 10, 15. 18, 20, 23, 25 28, 

^ or 30 nucleotides in length. However, in some embodi- 
ments, the oligonucleotides may be more than 30 nu- 
cleotides in length. 

[0380] Cell samples from the test subject are proc- 
essed for DNA using techniques well known to those 

45 vvith skill in the art. The nucleic ackJ is digested with re- 
striction enzymes such as EcoRI and Xbal. Following 
digestion, samples are applied to wells for electrophore- 
sis. The procedure, as known in the art, may be modified 
to accommodate polyacrylamkle electrophoresis, how- 

50 ever in this example, samples containing 5 ug of DNA 
are loaded into wells and separated on 0.8% agarose 
gels. The gels are transferred onto nitrocellulose using 
standard Southern blotting techniques. 
[0381] 10 ng of each of the oligonucleotides are 

55 pooled and end-labeled with P^. The nitrocellutose is 
prehybridized with blocking solution and hybridized with 
the labeled probes. Following hybrkJization and wash- 
ing, the nitrocellutose filter is exposed to X-Omat AR X- 
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ray film. The resuttihg hybridization pattern will be 
unique tor each individual, 

[0382] It is additionally contennplated within this ex- 
ample that the number of probe sequences used can be 
varied for additional accuracy or clarity 
[0383] In addition to their applications In forensics and 
identification. EST- related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may be 
mapped to their chromosomal locations. Example 41 
below describes radiation hybrid (RH) mapping of hu- 
man chromosomal regions using EST-related nucleic 
acids, positional segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nu- 
cleic acids. Example 42 below describes a representa- 
tive procedure for mapping EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids to their locations on human chromosomes. Exam- 
ple 43 below describes mapping of EST-related nucleic 
acids, positional segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nu- 
cleic acids on metaphase chromosomes by Fluores- 
cence In Situ Hybridization (FISH). 

2. Use of EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
iDositlonal segments of EST-related nucleic acids in 
Chromosome Mapping 

EXAMPLE 41 

Radiatbn hybrid mapping of EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids to the human genome 

[0384] Radiation hybrid (RH) mapping is a somatic 
cell genetic approach that can be used for high resolu- 
tion mapping of the human genome. In this approach, 
cell lines containing one or nrx5re human chromosomes 
are tethally irradiated, breaking each chromosome into 
fragments whose size depends on the radiation dose. 
These fragments are rescued by fusion with cultured ro- 
dent cells, yielding subclones containing different por- 
tions of the human genome. This technique is described 
by Benham etaL (Genom/cs 4:509-51 7, 1989) and Cox 
et al., (Science 250:245-250, 1990). The random and 
independent nature of the subclones permits efficient 
mapping of any human genome marker. Human DNA 
isolated from a panel of 80-100 cell lines provides a 
mapping reagent for ordering EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positkxial segments of EST-related nucleic 
acids. In this approach, the frequency of breakage be- 
tween markers is used to measure distance, allowing 
construction of fine resolution maps as has been done 
using conventional ESTs (Schuler et al., Science 274: 



540-546, 1996). 

[0385] RH mapping has been used to generate a 
high -resolution whole genome radiation hybrid map of 
human chromosome 17q22-q25.3 across the genes for 

5 growth hormone (GH) and thymidine kinase (TK) (Fos- 
ter etal.. Genomics 33:185-192, 1996). the region sur- 
rounding the Gorlin syndrome gene (Obermayr et al, 
Eur. J. Hum. Genet. 4:242-245, 1996), 60 loci covering 
the entire short arm of chronrrasome 12 (Raeymaekers 

10 etaL, Genomics 29:170-178, 1995), the region of hu- 
man chromosome 22 containing the neurofibromatosis 
type 2 locus (Frazer etaL, Geno/nAcs 14:574-584, 1 992) 
and 1 3 loci on the long arm of chromosome 5 (War- 
rington etaL, Genom/cs 11:701-708, 1991). 

IS 

EXAMPLE 42 

Mapping of EST-related nucleic acids, positional 
segments of EST-related nucleic actds or fragments of 
20 positional segments of EST-related nucleic acids to 
Human Chromosomes using PCR technigues 

[0386] EST-related nuciek: acids, positranal seg- 
ments of EST-related nuciek; ackis or fragments of po- 
sitional segments of EST-related riucleic acids may be 
assigned to human chromosomes using PCR based 
methodologies. In such approaches, ollgonucleotkle 
primer pairs are designed from EST-related nucleic ac- 
kis, positional segments of EST-related nuciek: ackis or 
fragmerits of positional segments of EST-related nucleic 
ackJs to minimize the chance of amplifying through an 
intron. Preferably, the oligonucleotide primers are 18-23 
bp In length and are designed for PCR amplification. The 
creation of PCR primers from known sequences is well 
known to those with skill in the art. For a review of PCR 
technotogy see Eriich. in PCR Technotogy; Principles 
and Applications for DNA Amplification. 1992. W.H. 
Freeman and Co.. New York. 

[0387] The primers are used in polymerase chain re- 
actions (PCR) to amplify templates from total human ge- 
nomic DNA. PCR conditions are as follows: 60 ng of ge- 
nomic DNA is used as a template for PCR with 80 ng of 
each oligonucleotkie primer, 0.6 unit of Taq polymerase, 
and 1 ^Cu of a 32P-labeted deoxycytidine triphosphate. 
The PCR is performed in a microplate thermocycler 
(Techne) under the following conditions: 30 cycles of 
94'C, 1 .4 min; 55'*C, 2 min; and 72*C, 2 min; with a final 
extenson at 72*C for 10 min. The amplified products 
are analyzed on a 6% polyacrylamide sequencing gel 
and visualized by autoradbgraphy If the length of the 
resulting PCR product is kientical to the distance be- 
tween the ends of the primer sequences in the 5'EST 
from which the primers are derived, then the PCR reac- 
tion is repeated with DNA templates from two panels of 
human-rodent somatic cell hybrids. BIOS PCRabte 
DNA (BIOS Corporatk>n) and NIGMS Human-Rodent 
Somatic Cell Hybrid Mapping Panel Number 1 (NIGMS, 
Camden, NJ). 
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[0386] PGR is used to screen a series of somatic cell 
hybrid cell lines containing defined sets of human chro- 
mosomes for the presence of a given 5' EST. DNA is iso- 
lated from the somatic hybrids and used as starting tem- 
plates for PGR reactions using the primer pairs from the 
EST-related nucleic acids, positional segments of EST- 
related nucleic acids or fragments of positional seg- 
ments of EST-related nucleic acids. Only those somatic 
cell hybrids with chromosomes containing the human 
gene corresponding to the EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positbnal segments of EST-related nucleic 
acids will yield an amplified fragment. The 5'ESTs are 
assigned to a chromosome by analysis of the segrega- 
tion pattern of PGR products from the somatic hybrid 
DNA templates. The single human chromosome 
present in all cell hybrids that give rise to an amplified 
fragment is the chromosome containing that EST-relat- 
ed nucleic acids, positional segments of EST-related nu- 
cleic acids or fragments of positional segments of EST- 
related nucleic acids. For a review of techniques and 
analysis of results from somatic ceil gene mapping ex- 
periments. (See Ledbetter et al., Genomics 6:475-481 
(1990)). 

[0389] Alternatively, the EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids may be mapped to individual chronfx>somes using 
FISH as described in Example 43 below. 

EXAMPLE 43 

Mapping of EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids to 
Chromosomes Using 

Fluorescence fn Situ Hybridization 

[0390] Fluorescence in situ hybridization allows the 
EST-related nucleic acids, positional segments of EST- 
related nucleic acids or fragments of positional seg- 
ments of EST-related nucleic acids to be mapped to a 
particular location on a given chromosome. The chro- 
mosomes to be used for fluorescence in situ hybridiza- 
tion techniques may be obtained from a variety of sourc- 
es including cell cultures, tissues, or whole blood. 
[0391] In a preferred embodiment. chrorTK>somal lo- 
calization of EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids are ob- 
tained by FISH as described by Cherif etaL (Proc. Nati. 
Acad. ScL U.S.A , 87:6639-6643, 1990). Metaphase 
chromosomes are prepared from phytohemagglutinin 
(PHA)-stimulated blood cell donors. PHA-stimulated 
lymphocytes from healthy males are cultured for 72 h in 
RPMI- 1640 medium. For synchronization, methotrexate 
(10 ^M) Is added for 17 h, followed by addition of 5-bro- 



modeoxyuridine (5-BrdU, 0.1 mM) for 6 h. Colcemid (1 
^ig/ml) is added for the last 1 5 min before harvesting the 
cells. Cells are collected, washed in RPMI, incubated 
with a hypotonic solution of KCI (75 mM) at 37'C for 1 5 

5 min and fixed in three changes of methanol:acetic acid 
(3:1). The cell suspension is dropped onto a glass slide 
and air dried. The EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids is la- 

10 beled with biotin-1 6 dUTP by nick translation according 
to the manufacturer's instructbns (Bethesda Research 
Laboratories. Bethesda, MD), purified using a Sepha- 
dex G-50 column (Pharmacia. Upsala. Sweden) and 
precipitated. Just prior to hybridization, the DNA pellet 

15 is dissolved in hybridization buffer (50% formamide. 2 X 
SSC, 10% dextran sulfate, 1 mg/ml sonicated salmon 
sperm DNA, pH 7) and the probe is denatured at 70*C 
for 5-10 min. 

[0392] Slides kept at -20°C are treated for 1 h at 37** C 
20 with RNase A(100^g/ml), rinsed three times in 2 X SSC 
and dehydrated in an ethanol series. Chromosome 
preparations are denatured in 70% formamide, 2 X SSC 
for 2 min at 70*C. then dehydrated at 4**G. The sikies 
are treated with proteinase K (10 ng/100 ml in 20 mM 
25 Tris-HGI. 2 mM CaCIa) at 37'*C for 8 min and dehydrat- 
ed. The hybridization mbrture containing the probe is 
placed on the slide, covered with a coverslip. sealed with 
rubber cement and incubated overnight in a humid 
chamber at 37'G. After hybridization and post-hybridi- 
30 zation washes, the biotinylated probe is detected by avi- 
din-FITG and amplified with additional layers of bioti- 
nylated goat anti-avkiin and avidin-FITC. For chromo- 
somal localization, fluorescent R-bands are obtained as 
prevbusly described (Cherif ef a/., supra.). The sIkJes 
35 are observed undisr a LEI C A fluorescence microscope 
(DMRXA). Chromosomes are counterstained with pro- 
pkjium kxlide and the fluorescent signal of the probe ap- 
pears as two symmetrical yelbw-green spots on both 
chromatids of the fluorescent R-band chromosome, 
40 (red). Thus, a partrcular EST-related nucleic acids, po- 
sitional segments of EST-related nucleic ackJs or frag- 
ments of positional segments of EST-related nuctek: ac- 
ids may be localized to a partrcular cytogenetic R-band 
on a given chronx)some. Once the EST-related nucleic 
^ ackls, positional segments of EST-related nuciek: acids 
or fragments of positional segments of EST-related nu- 
cleic acids have been assigned to particular chromo- 
somes using the techniques described in Examples 
41-43 above, they may be utilized to construct a high 
50 resolutk>n nrtap of the chromosomes on which they are 
kx:ated or to identify the chromosomes in a sample. 

EXAMPLE 44 

55 Use of EST-related nuciek: acids, positional segments 
of EST-related nucleic acids or fragments of positional 
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segments of EST-related nucleic acids to Construct or 
Expand Chromosome Maps 

[0393] Chromosome mapping involves assigning a 
given unique sequence to a particular chromosome as 
described above. Once the unique sequence has been 
mapped to a given chromosome, it is ordered relative to 
other unique sequences located on the same chromo- 
some. One approach to chromosome mapping utilizes 
a series of yeast artificial chronrrasomes (YACs) bearing 
several thousand long inserts derived from the chromo- 
somes of the organism from which the EST-related nu- 
cleic acids, positional segments of EST-related nucleic 
acids or fragments of positional segments of EST-relat- 
ed nucleic acids are obtained. This approach is de- 
scribed in Ramaiah Nagaraja etaL, Genorm Research 
7:210-222, March 1997. Briefly, in this approach each 
chromosome is broken into overlapping pieces which 
are inserted into the YAC vector. The YAC inserts are 
screened using PCR or other methods to determine 
whether they include the EST-related nucleic acids, po- 
sitional segments of EST-related nucleic acids or frag- 
ments of positional segments of EST-related nucleic ac- 
ids whose position is to be determined. Once an insert 
has been found which includes the 5'EST. the insert can 
be analyzed by PCR or other methods to determine 
whether the insert also contains other sequences known 
to be on the chromosome or in the region from which 
the EST-related nucleic acids, positkxial segments of 
EST-related nucleic acids or fragments of positk)nal 
segments of EST-related nucleic acids was derived. 
This process can be repeated for each insert in the YAC 
library to determine the location of each of the EST-re- 
lated nucleic acids, positional segments of EST-related 
nucleic acids or fragments of positional segments of 
EST-related nucleic acids relative to one another and to 
other known chronrxjsomal markers. In this way, a high 
resolution map of the distribution of numerous unique 
markers along each of the organisms chronDOsomes 
may be obtained. 

[0394] As described in Example 45 below EST-relat- 
ed nucleic acids, positional segments of EST-related nu- 
cleic acids or fragments of positional segments of EST- 
related nucleic acids may also be used to identify genes 
associated with a partk;ular phenotype, such as hered- 
itary disease or drug response. 

3. Use of EST-related nucleic acids, positional 
segments of EST-related nuciek: acids or fragments of 
positional segments of EST-related nucleic ackis Gene 
Identificatton 

EXAMPLE 45 

Identification of genes associated with hereditan/ 
diseases or drug response 

[0395] This example illustrates an approach useful for 



the association of EST-related nucleic acids, positional 
segments of EST-related nuciek; acids or fragments of 
positional segments of EST-related nucleic acids with 
particular phenotypic characteristics. In this example, a 
5 particular EST-related nucleic acids, positional seg- 
ments of EST-related nucleic ackis or fragments of po- 
sitional segments of EST-related nucleic acids is used 
as a test probe to associate that EST-related nucleic ac- 
kis, positional segments of EST-related nucleic acids or 
10 fragments of positional segments of EST-related nucleic 
ackJs with a particular phenotypic characteristic. 
[0396] EST-related nucleic acids, posittonal seg- 
ments of EST-related nuciek; ackis or fragments of po- 
sitk>nal segments of EST-related nucleic acids are 
IS mapped to a particular location on a human chromo- 
some using techniques such as those described in Ex- 
amples 41 and 42 or other techniques known in the art. 
A search of Mendelian Inheritance in Man (V McKusick, 
Mendetian inheritance in Man (available on line through 
20 Johns Hopkins University Weteh Medical Library) re- 
veals the regbn of the human chromosome whch con- 
tains the EST-related nucleic acids, positional segments 
of EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic ackjs to be a very gene 
2S rich region containing several known genes and several 
diseases or phenotypes for whk;h genes have not been 
klentified. The gene corresponding to this EST-related 
nucleic ackJs, positional segments of EST-related nu- 
cleic acids or fragments of posrtkxial segments of EST- 
30 related nuciek; acids thus becomes an immediate can- 
dkiate for each of these genetic diseases. 
[0397] Celts from patients with these diseases or phe- 
notypes are isolated and expanded in culture. PCR 
primers from the EST-related nuciek; acids, positkxial 
35 segments of EST-related nuciek; acids or fragments of 
positional segments of EST-related nucleic acids are 
used to screen genomic DNA, mRNA or cDNA obtained 
from the patients. EST-related nuciek: acids, positional 
segments of EST-related nuciek; acids or fragmerits of 
40 positional segments of EST-related nucleic ackJs that 
are not amplified in the patients can be positively asso- 
ciated with a particular disease by further analysis. Al- 
ternatively, the PCR analysis may yield fragments of dif- 
ferent lengths when the samples are derived from an 
45 individual having the phenotype associated with the dis- 
ease than when the sample is derived from a healthy 
individual, indicating that the gene containing the EST- 
related nucleic acids, positional segments of EST-relat- 
ed nucleic acids or fragments of positional segments of 
so EST-related nuciek: acids may be responsible for the 
genetk; disease. 

VI. Use of EST-related nucleic aclde, positional 
segments of EST-related nucleic acids or fragments 
55 of positional segments of EST-related nucleic acids 
to Construct Vectors 

[0398] The present EST-related nuciek; ackJs, posi- 
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tional segments of EST-related nucleic acids or frag- 
ments of positional segments of EST-related nucleic ac- 
ids may also be used to construct secretion vectors ca- 
pable of directing the secretion of the proteins encoded 
by genes therein. Such secretion vectors may facilitate 
the purification or enrichment of the proteins encoded 
by genes inserted therein by reducing the number of 
background proteins from which the desired protein 
must be purified or enriched. Exemplary secretion vec- 
tors are described in Example 46 below. 

1 . Construction of secretion vectors 

EXAMPLE 46 

Construction of Sec ret ton Vectors 

[0399] The secretion vectors of the present invention 
include a promoter capable of directing gene expression 
in the host cell, tissue, or organism of interest. Such pro- 
moters include the Rous Sarcoma Virus promoter, the 
SV40 promoter, the human cytomegalovirus pronnoter, 
and other promoters familiar to those skilled in the art. 
[0400] A signal sequence from one of the EST-related 
nucleic acids, positional segments of EST-related nu- 
cleic acids or fragments of positional segments of EST- 
related nucleic ackis is operabty linked to the promoter 
such that the mRNA transcribed from the promoter will 
direct the translation of the signal peptide. Preferably, 
the signal sequence is from one of the nucleic acids of 
SEQ ID NOs.:24-4100. The host cell, tissue, or organ- 
ism may be any cell, tissue, or organism which recog- 
nizes the signal peptide encoded by the signal se- 
quence in the EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids. Suitable 
hosts include mammalian cells, tissues or organisms, 
avian cells, tissues, or organisms, insect cells, tissues 
or organisms, or yeast. 

[0401] In addition, the secretion vector contains clon- 
ing sites for inserting genes encoding the proteins which 
are to be secreted. The cloning sites facilitate the clon- 
ing of the insert gene in frame with the signal sequence 
such that a fusion protein in whk:h the signal peptide is 
fused to the protein encoded by the inserted gene is ex- 
pressed from the mRNA transcribed from the promoter 
The signal peptide directs the extracellular secretion of 
the fusion protein. 

[0402] The secretton vector may be DNA or RNA and 
may integrate into the chromosome of the host, be sta- 
bly maintained as an extrachromosomal repltoon in the 
host, be an artificial chronriosome, or be transiently 
present in the host. Preferably, the secretion vector is 
maintained in multiple copies in each host cell. As used 
herein, multiple copies means at least 2, 5, 10, 20, 25, 
50 or more than 50 copies per cell. In some embodi- 
ments, the multiple copies are maintained extrachromo- 
somally. In other embodiments, the multiple copies re- 



sult from amplification of a chromosomal sequence. 
[0403] Many nucleic acid backbones suitable for use 
as secretion vectors are known to those skilled in the 
art. including retroviral vectors, SV40 vectors, Bovine 
5 Papilloma Virus vectors, yeast integrating plasmkJs, 
yeast episomal plasmids, yeast artificial chromosomes, 
human artificial chromosomes, P element vectors, bac- 
ulovirus vectors, or bacterial plasmids capable of being 
transiently introduced into the host. 
10 [0404] The secretion vector may also contain a polyA 
signal such that the potyA signal is located downstream 
of the gene inserted into the secretion vector. 
[0405] After the gene encoding the protein for which 
secretion is desired is inserted into the secretion vector, 
^5 the secretton vector is introduced into the host cell, tis- 
sue, or organism using calcium phosphate precipitation, 
DEAE-Dextran, electroporation, liposome-mediated 
transfection, viral particles or as naked DNA. The pro- 
tein encoded by the inserted gene is then purified or en- 
riched from the supematant using conventional tech- 
niques such as ammonium sulfate precipitation, immu- 
noprecipitation, immunoaffinitychromatdgraphy. size 
exclusion chromatography, ion exchange chromatogra- 
phy, and HPLC. Alternatively, the secreted protein may 
be in a sufficiently enriched or pure state in the super- 
natant or growth media of the host to permit it to be used 
for its intended purpose without further enrichment. 
[0406] The signal sequences may also be inserted in- 
to vectors designed for gene therapy. In such vectors, 
the signal sequence is operably linked to a promoter 
such that mRNA transcribed from the promoter encodes 
the signal peptkJe. A cloning site is located downstream 
of the signal sequence such that a gene encoding a pro- 
tein whose secretton is desired may readily be inserted 
into the vector and fused to the signal sequence. The 
vector is introduced into an appropriate host cell. The 
protein expressed from the promoter is secreted extra- 
cellularty, thereby producing a therapeutic effect. 



Fuston Vectors 

[0407] The EST-related nucleic acids, posittonal seg- 
ments of EST-related nucleic actos or fragments of po- 
sitional segments of EST-related nucleic acids may be 
used to construct fusion vectors for the express ton of 
chimerte polypeptides. The chimeric polypepttoes com- 
prise a first polypeptide port ton and a second polypep- 
tide portion. In the fusion vectors of the present inven- 
tion, nucleto acids encoding the first polypeptide portton 
and the second polypeptide portion are joined in frame 
with one another so as to generate a nucleto acid en- 
coding the chimeric polypeptide. The nucleic acto en- 
coding the chimeric polypeptide is operably linked to a 
proTTioter which directs the expresston of an mRNA en- 
coding the chimeric polypeptide. The promoter may be 
in any of the expression vectors described herein includ- 



es 



30 



35 



40 EXAMPLE 47 



46 



SO 



48 



95 



EP 1 033 401 A2 



96 



ing those described in Examples 20 and 46. 
[0408] Preferably, the fusicxi vector is maintained in 
multiple copies in each host cell. In some embodiments, 
the multiple copies are maintained extrachromosomally. 
In other embodiments, the multiple copies result from 5 
amplification of a chromosomal sequence. 
[0409] The first polypeptide portion may comprise any 
of the polypeptides encoded by the EST-related nucleic 
acids, positional segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nu- io 
cleic acids. In some embodiments, the first polypeptide 
portbn may be one of the EST-related polypeptides, 
fragments of EST-related polypeptides, positional seg- 
ments of EST-related polypeptides, or fragments of po- 
sitional segments of EST-related polypeptides. is 
[041 0] The second polypeptide portion may comprise 
any polypeptide of interest. In some embodiments, the 
second polypeptide portion may comprise a polypeptide 
having a detectable enzymatic activity such as green flu- 
orescent protein or p galactosidase. Chimeric polypep- 20 
tides in which the second polypeptide portion comprises 
a detectable polypeptide may be used to determine the 
intracellular localization of the first polypeptide portion. 
In such procedures, the fusion vector encoding the chi- 
meric polypeptide is introduced into a host celt under 2S 
conditions which facilitate the expression of the chimeric 
polypeptide. Where appropriate, the cells are treated 
with a detection reagent which is visible under the mi- 
croscope following a catalytic reaction with the detecta- 
ble polypeptide and the cellular location of the detection 30 
reagent is determined. For example, if the polypeptide 
having a detectable enzymatic activity is p galactosi- 
dase, the cells may be treated with Xgal. Alternatively, 
where the detectable polypeptide is directly detectable 
without the addition of a detection reagent, the intracel- 3S 
lular location of the chimeric polypeptide is determined 
by performing microscopy under conditions in which the 
dectable polypeptide is visible. For example. If the de- 
tectable polypeptide is green fluorescent protein or a 
modified version thereof, microscopy is performed by 40 
exposing the host cells to light having an appropriate 
wavelength to cause the green fluorescent protein or 
modified version thereof to fluoresce. 
[0411] Alternatively, the second polypeptide portion 
may comprise a "polypeptide whose isolation, purifica- 4S 
tion. or enrichment is desired. In such embodiments, the 
isolation, purification, or enrichment of the second 
polypeptide portion may be achieved by performing the 
immunoaffinity chromatography procedures described 
below using an immunoaffinity column having an anti- so 
body directed against the first polypeptide portion cou- 
pled thereto. 

[0412] The proteins encoded by the EST-related nu- 
cleic acids, positional segments of EST-related nucleic 
acids or fragments of positional segments of EST-relat- ss 
ed nucleic acids or the EST-related polypeptides, frag- 
ments of EST-related polypeptides, positional segments 
of EST-related polypeptides, or fragments of positional 



segments of EST-related polypeptides may also be 
used to generate antibodies as explained in Examples 
20 and 33 in order to identify the tissue type or cell spe- 
cies from which a sample is derived as described in Ex- 
ample 48. 

EXAMPLE 48 

Identification of Tissue Types or Cell Species by Means 
of Labeled Tissue Specific Antibodies 

[0413] Identification of specific tissues is accom- 
plished by the visualization of tissue specific antigens 
by means of antibody preparations according to Exam- 
ples 20 and 33 which are conjugated, directly or indi- 
rectly to a detectable marker Selected labeled antibody 
species bind to their specific antigen binding partner in 
tissue sections, cell suspensions, or in extracts of solu- 
ble proteins from a tissue sample to provide a pattern 
for qualitative or semi-qualitative interpretation. 
[0414] Antisera for these procedures must have a po- 
tency exceeding that of the native preparation, and for 
that reason, antibodies are concentrated to a mg/ml lev- 
el by isolation of the gamma globulin fraction, for exam- 
ple, by ion-exchange chromatography or by ammonium 
sulfate fractionation. Also, to provide the most specific 
antisera, unwanted antibodies, for example to common 
proteins, must be removed from the gamma globulin 
fraction, for example by means of insoluble immunoab- 
sorbents. before the antibodies are labeled with the 
marker. Either monoclonal or heterotogous antisera is 
suitable for either procedure. 

1. fmmunohistochemical Techniques 

(041 5] Purified, high-titer antibodies, prepared as de- 
scribed above, are conjugated to a detectable marker, 
as described, for example, by Fudenberg. H., Chap. 26 
in; Basic 503 Clinical Immunology, 3"* Ed. Lange, Los 
Altos. California (1980) or Rose,, et al, Chap. 12 in: 
Methods in Immunodiagnosis, 2d Ed. John Wiley and 
Sons. New York (1^0). 

[0416] A fluorescent marker, either fluorescein or 
rhodamine, is preferred, but antibodies can also be la- 
beled with an enzyme that supports a cobr producing 
reaction with a substrate, such as horseradish peroxi- 
dase. Markers can be added to tissue-bound antibody 
in a second step, as described bebw Alternatively, the 
specific antitissue antibodies can be labeled with ferritin 
or other electron dense partbles, and localization of the 
ferritin coupled antigen -antibody complexes achieved 
by means of an electron microscope. In yet another ap- 
proach, the antibodies are radtolabeled, with, for 
example ^^1. and detected by overlaying the antibody 
treated preparatk>n with photographic emulsion. 
[0417] Preparations to carry out the procedures can 
comprise monoclonal or polycbnal antibodies to a sin- 
gle protein or peptide identified as specific to a tissue 
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type, for example, brain tissue, or antibody preparations 
to several antigenically distinct tissue specific antigens 
can be used in panels, independently or in mixtures, as 
required. 

[0418] Tissue sections and cell suspensions are pre- s 
pared for immunohistochemical examination according 
to common histologjcal techniques. Multiple cryostat 
sections (about 4 ^im, unfixed) of the unknown tissue 
and known control, are mounted and each slide covered 
with different dilutions of the antibody preparation. Sec- io 
tions of known and unknown tissues should also be 
treated with preparations to provide a positive control, 
a negative control, for example, pre-immune sera, and 
a control for non-specific staining, for example, buffer. 
[0419] Treated sections are incubated in a humid ^5 
chamber for 30 min at room temperature, rinsed, then 
washed in buffer for 30-45 min. Excess fluid is blotted 
away, and the marker developed. 
[0420] If the tissue specific antibody was not labeled 
in the first incubation, it can be labeled at this time in a 20 
second antibody-antibody reaction, for example, by 
adding fluorescein- or enzyme-conjugated antibody 
against the immunoglobulin class of the antiserum-pro- 
ducing species, for example, fluorescein labeled anti- 
body to mouse IgG. Such labeled sera are commercially 2S 
available. 

[0421] The antigen found in the tissues by the above 
procedure can be quantified by measuring the intensity 
of color or fluorescence on the tissue section, and cali- 
brating that signal using appropriate standards. 30 

2. Identification of Tissue Specific Soluble Proteins 

[0422] The visualizatbn of tissue specific proteins 
and Identification of unknown tissues from that proce- 3S 
dure is carried out using the labeled antibody reagents 
and detection strategy as described for immunohisto- 
chemistry; however the sample is prepared according 
to an electrophdretic technk^ue to distribute the proteins 
extracted from the tissue In an orderly array on the basis ^ 
of molecular weight for detection. 
[0423] A tissue sample is homogenized using a Virtis 
apparatus; cell suspensions are disrupted by Dounce 
homogenization or osmotic lysis, using detergents in ei- 
ther case as required to disrupt cell membranes, as is ^5 
the practtee in the art. Insoluble cell components such 
as nuclei, mcrosomes, and membrane fragments are 
removed by ultracentrlf ugatlon, and the soluble protein- 
containing fraction concentrated If necessary and re- 
served for analysis. so 
[0424] A sample of the soluble protein solution Is re- 
solved into individual protein species by conventbnal 
SOS polyacrytamide electrophoresis as described, for 
example, by Davis.L. efa/.. Section 19-2 in: Basic Meth- 
ods in Molecular Biology (P. Leder. ed), Elsevier, New ss 
York ( 1 986), using a range of amounts of polyacry lamide 
in a set of gels to resolve the entire nrK^lecular weight 
range of proteins to be detected in the sample. A size 



marker is run in parallel for purposes of estinnating mo- 
lecular weights of the constituent proteins. Sample size 
for analysis is a convenient volume of from 5 to 55 )il, 
and containing from about 1 to 1 00 ^g protein. An aliquot 
of each of the resolved proteins is transferred by blotting 
to a nitrocellulose filter paper, a process that maintains 
the pattern of resolution. Multiple copies are prepared. 
The procedure, known as Western Blot Analysis, is well 
described in Davis, L. et ai, supra Section 19-3. One 
set of nitrocellulose blots is stained with Coomassie 
Blue dye to visualize the entire set of proteins for com- 
parison with the antibody bound proteins. The remaining 
nitrocelluk)se filters are then incubated with a solution 
of one or more specific antisera to tissue specific pro- 
teins prepared as described in Examples 20 and 33. In 
this procedure, as in procedure A above, appropriate 
positive and negative sample and reagent controls are 
run. 

[0425] In either procedure described above a detect- 
able label can be attached to the primary tissue antigen- 
primary antibody complex according to various strate- 
gies and permutations thereof. In a straightf onward ap- 
proach, the primary specific antibody can be labeled; al- 
ternatively, the unlabeled complex can be bound by a 
labeled secondary antl-IgG antibody. In other approach- 
es, either the primary or secondary antibody is conju- 
gated to a btotin molecule, whrch can. in a subsequent 
step, bind an avid in conjugated nnarker. According to yet 
another strategy, enzyme labeled or radioactive protein 
A, which has the property of binding to any IgG, is bound 
in a final step to either the primary or secondary anti- 
body. 

EXAMPLE 49 

Immunohistochemical Locallzatk)n of Polypeptides 

[0426] The antibodies prepared as described in Ex- 
amples 20 and 33 above may be utilized to determine 
the cellular location of a polypeptide. The polypeptide 
may be any of the polypeptides encoded by EST-retated 
nucleic ackjs. positional segments of EST-related nu- 
cleic acids or fragments of positk>nal segments of EST- 
related nucleic acids or the polypeptide may be one of 
the EST-related polypeptides, fragments of EST-related 
polypeptides, posittonal segments of EST-related 
polypeptides, or fragments of posrtonal segments of 
EST-related polypeptides. In some embodiments, the 
polypeptide may be a chimeric polypeptide such as 
those encoded by the f usk)n vectors of Example 47. 
[0427] Cells expressing the polypeptide to be local- 
ized are applied to a microscope sikie and fixed using 
any of the procedures typk:ally employed in immunohis- 
tochemical kx^allzatton techniques, including the meth- 
ods described in Current Protocols in Molecular Biology, 
John Wiley and Sons, Inc. 1997. Following a washing 
step, the cells are contacted with the antibody. In some 
embodiments, the antibody is conjugated to a detecta- 
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ble marker as described above to facilitate detection. Al- 
ternatively, in some embodiments, after the cells have 
been contacted with an antibody to the polypeptide to 
be localized, a secondary antibody which has been con- 
jugated to a detectable marker is placed in contact with 
the antibody against the polypeptide to be localized. 
[0428] Thereafter, microscopy is performed under 
conditions suitable for visualizing the cellular location of 
the polypeptide. 

[0429] The visualizatk>n of tissue specific antigen 
binding at levels above those seen in control tissues to 
one or more tissue specific antibodies, directed against 
the polypeptides encoded by EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positranal segments of EST-related nucleic 
acids or antibodies against the EST-related polypep- 
tides, fragments of EST-related polypeptides, positional 
segments of EST-related polypeptides, or fragments of 
positional segments of EST-related polypeptides, can 
identify tissues of unknown origin, for example, forensic 
samples, or differentiated tumor tissue that has metas- 
tasized to foreign bodily sites. 
[0430] The antibodies of Example 20 and 33 may also 
be used in the immunoaffinity chromatography tech- 
niques described below to isolate, purify or enrich the 
polypeptkfes encoded by the EST-related nucleic acids, 
positional segments of EST-related nuciek: acids or 
fragments of posit kxial segments of EST-related nucleic 
acids or to isolate, purify or enrfeh EST-related polypep- 
tides, fragments of EST-related polypeptides, positional 
segments of EST-related polypeptides, or fragments of 
positional segments of EST-related polypeptkies. The 
immunoaffinity chromatography techniques described 
below may also be used to isolate, purify or enrich 
polypeptkjes which have been linked to the polypep- 
tides encoded by the EST-related nucleic acids, posi- 
tional segments of EST-related nucleic acids or frag- 
ments of positk)nal segments of EST-related nucleic ac- 
ids or to isolate, purify or enrich polypeptides which have 
been linked to EST-related polypeptides, fragments of 
EST-related polypeptides, positional segments of EST- 
related polypeptides, or fragments of positbnal seg- 
ments of EST-related polypeptides. 

EXAMPLE 50 

Immunoaffinity Chronriatography 

[0431] Antibodies prepared as described above are 
coupled to a support. Preferably, the antibodies are 
monoclonal antibodies, but polycbnal antibodies may 
also be used. The support may be any of those typically 
employed in immunoaffinity chromatography, including 
Sepharose CL-4B (Pharmacia, Piscataway, NJ). 
SepharoseCL-2B (Pharmacia. Piscataway. NJ). Affi-get 
10 (Biorad, Rk:hmond, CA), or glass beads. 
[0432] The antibodies may be coupled to the support 
using any of the coupling reagents typically used in im- 



munoaffinity chromatography, including cyanogen bro- 
mide. After coupling the antibody to the support, the sup- 
port is contacted with a sample which contains a target 
polypeptide whose isolation, purificatwn or enrichment 

s is desired. The target polypeptide may be a poiypeptkje 
encoded by the EST-related nucleic acids, positional 
segments of EST-related nuciek; acids or fragments of 
positional segments of EST-related nucleic acids or the 
target polypeptkie may be one of the EST-related 

10 polypeptides, fragments of EST-related polypeptides, 
positional segments of EST-related polypeptides, or 
fragments of positional segments of EST-related 
polypeptides. The target polypeptides may also be 
^ polypeptides which have been linked to the polypep- 

'5 tides encoded by the EST-related nuciek; acids, posi- 
tional, segments of EST-related nucleic acids or frag- 
ments of positional segments of EST-related nucleic ac- 
kls or the target polypeptides may be polypeptides 
which have been linked to EST-related polypeptides, 

20 fragments of EST-related polypeptides, positional seg- 
ments of EST-related polypeptkJes, or fragments of po- 
sitional segments of EST-related polypeptides using the 
fusion vectors described above. 
[0433] Preferably, the sample is placed in contact with 

2S the support for a sufficient amount of time and under 
appropriate conditions to allow at least 50% of the target 
polypeptide to specifically bind to the antibody coupled 
to the support. 

[0434] Thereafter, the support is washed with an ap- 

30 propriate wash solution to remove polypeptides whk;h 
have non-specificalty adhered to the support The wash 
solution may be any of those typically employed in im- 
munoaffinity chromatography, including PBS, Tris-lithi- 
um chlorkie buffer (0.1 M lysine base and 0.5M lithium 

3S chtoride. pH 8.0), Tris-hydrochlorkle buffer (0.05M Tris- 
hydrochloride, pH 8.0). orTris/Triton/NaCI buffer (SOmM 
Tris.cl. pH 8-0 or 9.0, 0. 1% Triton X-100, and O.SMNaCI). 
[0435] After washing, the specifically bound target 
polypeptide is eluted from the support using the high pH. 

^ or low pH elution solutions typically employed in immu- 
noaffinity chromatography. In particular, the elution so- 
lutions may contain an eluant such as triethanoiamine, 
diethylamine. calcium chtorkJe, sodium thkx;yanate, po- 
tasssium bromide, acetic acid, or glycine. In some em- 

45 bodiments, the elution solution may also contain a de- 
tergent such as Triton X-100 or octyl-p-D-glucoside. 
[0436] The EST-related nucleic acids, positk)nal seg- 
ments of EST-related nucleic ackJs or fragments of po- 
sitronal segments of EST-related nuciek; acids may also 

so be used to clone sequences located upstream of the 
5'ESTs which are capable of regulating gene expres- 
sk>n. including promoter sequences, enhancer se- 
quences, and other upstream sequences which influ- 
ence transcriptkxi or translation levels. Once identified 

ss and cloned, these upstream regulatory sequences may 
be used in expresson vectors designed to direct the ex- 
pressk^n of an inserted gene in a desired spatial, tem- 
poral, developmental, or quantitative fashion. Example 
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51 describes a method for cloning sequences upstream 
of the EST-related nucleic acids, positional segments of 
EST- related nucleic acids or fragments of positional 
segments of EST-related nucleic acids. 

2. Identification of upstream sequences with promoting 
or reoulatory activities 

EXAMPLE 51 

Use of EST-related nucleic acids, positional segments 
of EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids to Clone 
Upstream Sequences from Genomic DNA 

[0437] Sequences derived from EST-related nucleic 
acids, posrtbnal segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nu- 
cleic acids may be used to isolate the promoters of the 
corresponding genes using chronK)some walking tech- 
niques. In one chromosome walking technique, which 
utilizes the GenomeWfeilker™ kit available from Clon- 
tech, five complete genomic DNA samples are each di- 
gested with a different restriction enzyme which has a 6 
base recognition site and leaves a blunt end. Following 
digestion, oligonucleotide adapters are ligated to each 
end of the resulting genomic DNA fragments. 
[0438] For each of the five genomic DNA libraries, a 
first PGR reaction is performed according to the manu- 
facturer's instructions using an outer adapter primer pro- 
vided In the kit and an outer gene specific primer. The 
gene specific primer should be selected to be specific 
for 5' EST of interest and should have a melting temper- 
ature, length, and locatbn in the EST-related nuclec ac- 
ids, positional segments of EST-related nucleic acids or 
fragments of positbnal segments of EST-related nuciek: 
acids which is consistent with its use In PGR reacttons. 
Each first PGR reaction contains 5ng of genomic DNA, 
5 \i\ of 10X Tth reaction buffer, 0.2 mM of each dNTR 
0. 2 \iM each of outer adapter primer and outer gene spe- 
cific primer, 1.1 mM of MgPAc)^, and 1 \i\ of the Tth 
polymerase SOX mix in a total volume of 50 ^1. The re- 
action cycle for the first PGR reactton is as follows: 1 
min at QA'C / 2 sec at 94'G, 3 min at 72*G (7 cycles) / 
2 sec at 94'*C, 3 min at 67'G (32 cycles) / 5 min at 67*C. 
[0439] The product of the first PGR reaction is diluted 
and used as a template for a second PGR reactkm ac- 
cording to the manufacturer's instructions using a pair 
of nested primers which are located internally on the am- 
plicon resulting from the first PGR reaction. For exam- 
ple, 5 ^il of the reactran product of the first PGR reactbn 
mixture may be diluted 180 times. Reacttons are made 
in a 50 |il volume having a composition identical to that 
of the first PGR reaction except the nested primers are 
used. The first nested primer is specific for the adapter, 
and is provkJed with the GenomeWalker™ kit. The sec- 
ond nested primer is specific for the particular EST-re- 
lated nucleic acids, positional segments of EST-related 



nucleic acids or fragments of positional segments of 
EST-related nucleic acids for which the promoter is to 
be cloned and should have a melting temperature, 
length, and location in the EST-related nucleic acids, po- 

5 sitional segments of EST-related nucleic ackis or frag- 
ments of positional segments of EST-related nucleic ac- 
kls which is consistent with its use in PGR reactions. 
The reaction parameters of the second PGR reactton 
are as follows: 1 min at 94°G / 2 sec at 94°G. 3 min at 

10 72°G(6cycles)/2secat94'»G. 3minat67**G{25cycles) 
/ 5 min at - 67»G. The product of the second PGR reac- 
tion is purified, cloned, and sequenced using standard 
technkfues. 

[0440] Alternatively, two or more human genomic 
'5 DNA libraries can be constructed by using two or more 
restrictran enzymes. The digested genomic DNA is 
ckxied into vectors whteh can be converted into single 
stranded, circular, or linear DNA. A bbtinylated oligonu- 
cleotide comprising at least 15 nucleotides from the 
EST-related nucleic acids, positk)nal segments of EST- 
related nucleic ackis or fragments of positional seg- 
ments of EST-related nucleic acids sequence is hybrki- 
ized to the single stranded DNA. Hybrkis between the 
bctinylated oligonucleotide and the single stranded 
DNA containing the EST-related nucleic ackls, position- 
al segments of EST-related nuciek: ackJs or fragments 
of positional segments of EST-related nuclec ackis are 
isolated as described above. Thereafter, the single 
stranded DNA containing the EST-retated nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
ackJs is released from the beads and converted into 
double stranded DNA using a primer specific for the 
EST-related nucleic ackJs, posrtk>nal segments of EST- 
retated nucleic acids or fragments of positional seg- 
ments of EST-related nuciek; acids or a primer corre- 
sponding to a sequence included in the ck)ning vector. 
The resulting double stranded DNA is transformed into 
bacteria. cDNAs containing the EST-related nucleic ac- 
ids, positbnal segments of EST-related nucleic ackls or 
fragments of positional segments of EST-related nucleic 
ackis are identified by cotony PGR or colony hybridiza- 
tion. 

[0441] Once the upstream genomic sequences have 
been cloned and sequenced as described above, pro- 
spective promoters and transcriptk)n start sites within 
the upstream sequences nnay be identified by compar- 
ing the sequences upstream of the EST-related nucleic 
ackis, positional segments of EST-related nuctec acids 
or fragments of positional segments of EST-related nu- 
cleic acids with databases containing known transcrip- 
tion start sites, transcriptk)n factor binding sites, or pro- 
moter sequences. 

[0442] In addition, promoters in the upstream se- 
quences may be identified using promoter reporter vec- 
tors as described in Example 53. 
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EXAMPLE 53 

Identlficatbn of Promoters in Cloned Upstream 
Sequences 

5 

[0443] The genomic sequences upstream of the EST- 
related nucleic acids, positbnal segments of EST-relat- 
ed nucleic acids or fragments of positional segments of 
EST-related nucleic acids are cloned into a suitable pro- 
moter reporter vector, such as the pSEAP-Basic, io 
pSEAP-Enhancer, ppgal-Basic. ppgal-Enhancer, or 
pEGFP-1 PronfKJter Reporter vectors available from 
Clontech. Briefly, each of these promoter reporter vec- 
tors include multiple cloning sites positioned upstream 
of a reporter gene encoding a readily assayable protein '5 
such as secreted alkaline jshosphatase. p galactosi- 
dase. or green fluorescent protein. The sequences up- 
stream of the EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids are in- 20 
serted into the cloning sites upstream of the reporter 
gene In both orientations and introduced into an appro- 
priate host cell. The level of reporter protein is assayed 
and compared to the level obtained from a vector which 
lacks an insert in the ctoning site. The presence of an 2S 
elevated expression level in the vector containing the 
insert with respect to the control vector Indicates the 
presence of a promoter in the insert. If necessary, the 
upstream sequences can be cloned into vectors which 
contain an enhancer for augmenting transcription levels 30 
from weak promoter sequences. A significant level of 
expression above that observed with the vector lacking 
an insert indicates that a promoter sequence is present 
in the inserted upstream sequence. 
[0444] Appropriate host cells forthe promoter reporter 3S 
vectors may be chosen based on the results of the 
above described determinatbn of expression patterns 
of the EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids. For example, if 40 
the expression pattern analysis indicates that the mRNA 
corresponding to a particular EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positk)nal segments of EST-related nucleic 
acids is expressed in fibroblasts, the prorhoter reporter ^ 
vector may be introduced into a human fibroblast cell 
line. 

[0445] Promoter sequences within the upstream ge- 
nomic DNA may be further defined by constructing nest- 
ed deletions in the upstrearn DNA using conventkDnal so 
techniques such as Exonuclease III digestion. The re- 
sulting deletion fragments can be inserted into the pro- 
moter reporter vector to determine whether the deletion 
has reduced or obliterated promoter activity. In this way, 
the boundaries of the pronrraters may be defined. If de- ss 
sired, potential individual regulatory sites within the pro- 
moter may be identified using site directed mutagenesis 
or linker scanning to obliterate potential transcription 



104 

factor binding sites within the prorrwter individually or in 
combination. The effects of these mutatbns on tran- 
scription levels may be determined by inserting the mu- 
tations into the ctoning sites in the promoter reporter 
vectors. 

EXAMPLE 54 

Cloning and Identification of PronrKPters 

[0446] Using the method described in Example 51 
above with 5' ESTs, sequences upstream of several 
genes were obtained. Using the primer pairs GGG AAG 
ATG GAG ATA GTA TTG CCT G (SEQ ID NO: 15) and 
CTG CCA TGT AC A TGA TAG AG A GAT TC (SEQ ID 
NO: 16), the promoter having the internal designatton 
P13H2 (SEQ ID NO: 17) was obtained. 
[0447] Using the primer pairs GTA CCA GGGG ACT 
GTG ACC ATT GC (SEQ ID NO: 18) and CTG TGA CCA 
TTG CTC CCA AG A GAG (SEQ ID NO:1 9), the promot- 
er having the intemal designation P15B4 (SEQ ID NO: 
20) was obtained. 

[0448] Using the primer pairs CTG GGA TGG AAG 
GCA CGG TA (SEQ ID NO:21) and GAG ACC AC A 
CAG CTA GAC AA (SEQ ID NO:22), the promoter hav- 
ing the intemal designation P29B6 (SEQ ID NO:23) was 
obtained. 

[0449] Figure 4 provides a schematic description of 
the pronrraters isolated and the way they are assembled 
with the corresponding 5' tags. The upstream sequenc- 
es were screened for the presence of motifs resembling 
transcription factor binding sites or known transcription 
start sites using the computer program Matlnspector re- 
lease 2.0, August 1996. 

[0450] Figure 5 describes the transcriptton factor 
binding sites present in each of these promoters. The 
columns labeled matrice provides the name of the Mat- 
Inspector matrix used. The column labeled position pro- 
vides the 5' position of the pronrrater site. Numeratioi of 
the sequence starts from the transcription site as deter- 
mined by matching the genomic sequence with the 5' 
EST sequence. The column labeled "orientatk^n" indi- 
cates the DNA strand on which the site is found, with 
the + strand being the coding strand as determined by 
matching the genomic sequence with the sequence of 
the 5' EST. The column labeled 'score' provkJes the 
Matlnspector score found for this site. The column la- 
beled "length" provides the length of the site in nucle- 
otides. The column labeled 'sequence' provkles the se- 
quence of the site found. 

[0451] Bacterial ctones containing plasmids contain- 
ing the promoter sequences described above described 
above are presently stored in the inventor's laboratories 
under the intemal kJentification numbers provided 
above. The inserts may be recovered from the deposit- 
ed materials by growing an alk^uot of the appropriate 
bacterial ctone in the appropriate medium. The plasmid 
DNA can then be isolated using plasmid isolation pro- 
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cedures familiar to those skilled in the art such as alka- 
line lysis minipreps or large scale alkaline lysis plasmid 
isolation procedures. If desired the plasmid DNA may 
be further enriched by centrif ugation on a cesium chlo- 
ride gradient, size exclusion chromatography, or anion s 
exchange chromatography. The plasmid DNA obtained 
using these procedures may then be manipulated using 
standard cloning techniques familiar to those skilled in 
the art. Alternatively, a PGR can be done with primers 
designed at both ends of the inserted EST-retated nu- io 
cleic acids, positional segments of EST-related nucleic 
acids or fragments of positional segments of EST-relat- 
ed nucleic acids. The PGR product which corresponds 
to the EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional ^5 
segments of EST-related nucleic acids can then be ma- 
nipulated using standard cloning techniques familiar to 
those skilled in the art. 

[0452] The promoters and other regulatory sequenc- 
es located upstream of the EST-related nucleic acids, 20 
positional segments of EST-related nucleic acids or 
fragments of posittonal segments of EST-related nucleic 
acids may be used to design expression vectors capa- 
ble of directing the expression of an inserted gene in a 
desired spatial, temporal, developmental, or quantita- 25 
tive manner. A promoter capable of directing the desired 
spatial, temporal, developmental, and quantitative pat- 
terns may be selected using the results of the expres- 
sion analysis described above. For example, if a pro- 
moter which confers a high level of expresskx) in muscle ^ 
is desired, the promoter sequence upstream of EST-re- 
lated nucleic acids, positional segments of EST-related 
nucleic acids or fragments of positbnal segments of 
EST-related nucleic acids derived from an mRNA which 
are expressed at a high level In muscle, as determined 3S 
by the methods above, may be used in the expression 
vector. 

[0453] Preferably, the desired promoter Is placed near 
multiple restriction sites to facilitate the cloning of the 
desired insert downstream of the pronrrater, such that the 40 
promoter is able to drive expression of the inserted 
gene. The promoter may be inserted in conventional nu- 
cleic acid backbones designed for extrachromosomal 
replrcation, integration into the host chromosomes or 
transient expression. Suitable backbones for the ^ 
present expression vectors include retroviral back- 
bones, backbones from eukaryotic episomes such as 
SV40 or Bovine Papiltoma Virus, backbones from bac- 
terial episomes, or artificial chromosomes. 
[0454] Preferably, the expression vectors also include so 
a polyA signal downstream of the multiple restriction 
sites for directing the polyadenylation of mRNA tran- 
scribed from the gene inserted into the expression vec- 
tor. 

[0455] Following the identification of promoter se- ss 
quences using the procedures of Examples 51 -54, pro- 
teins which interact with the promoter may be identified 
as described in Example 55 below. 



EXAMPLE 55 

Identification of Proteins Which Interact with Promoter 
Sequences. Upstream Regulatory Sequences, or 
mRNA 

[0456] Sequences within the promoter region which 
are likely to bind transcription factors may be identified 
by homology to known transcriptbn factor binding sites 
or through conventional mutagenesis or deletkjn analy- 
ses of reporter plasmids containing the promoter se- 
quence. For example, deletions may be made in a re- 
porter plasmid containing the pronrrater sequence of in- 
terest operably linked to an assayable reporter gene. 
The reporter plasmids carrying various deletions within 
the promoter region are transfected into an appropriate 
host cell and the effects of the deletions on expression 
levels is assessed. Transcriptbn factor binding sites 
within the regbns in whbh deletions reduce expressbn 
levels may be further localized using site directed mu- 
tagenesis, linker scanning analysis, or other techniques 
familiar to those skilled in the art. 
[0457] Nucleic acids encoding proteins which interact 
with sequences in the promoter may be identified using 
one-hybrid systems such as those described in the man- 
ual accompanying the Matchmaker One-Hybrid System 
kit available from Ctontech (Catalog No. K1 603-1). 
Briefly, the Matchmaker One-hybrkj system is used as 
follows. The target sequence for which It Is desired to 
bentlfy binding proteins is cloned upstream of a selecta- 
ble reporter gene and integrated into the yeast genome. 
Preferably, multiple copies of the target sequences are 
inserted into the reporter plasmid in tandem. A library 
comprised of fuskxis between cONAs to be evaluated 
for the ability to bind to the promoter and the activatbn 
domain of a yeast transcriptkxi factor, such as GAL4, is 
transformed into the yeast strain containing the integrat- 
ed reporter sequence. The yeast are plated on selective 
media to select cells expressing the selectable marker 
linked to the promoter sequence. The colonies which 
grow on the selective media contain genes encoding 
proteins which bind the target sequence. The Inserts in 
the genes encoding the fusion proteins are further char- 
acterized by sequencing. In addition, the inserts may be 
inserted into expressbn vectors or in vitro transcriptk>n 
vectors. Binding of the polypeptbes encoded by the in- 
serts to the promoter DNA may be confirmed by tech- 
nk^ues familiar to those skilled in the art, such as gel 
shift analysis or DNAse protection analysis. 

VII. Use of EST-retated nucleic acids, positional 
segments of EST-reiated nucleic acids or fragments 
of positional segments of EST-related nucleic acids 
in Gene Therapy 

[0458] The present invention also comprises the use 
of EST-related nucleic acids, positional segments of 
EST-retated nucleic acbs or fragments of positional 
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segments of EST-related nucleic acids in gene therapy 
strategies, including antisense and triple helix strategies 
as described in Examples 56 and 57 below. In antisense 
approaches, nucleic acid sequences complementary to 
an mRNA are hybridized to the mRNA intracellularly. 5 
thereby blocking the expression of the protein encoded 
by the mRNA, The antisense sequences may prevent 
gene expression through a variety of mechanisms. For 
example, the antisense sequences may inhibit the abil* 
ity of ribosomes to translate the mRNA. Alternatively, the io 
antisense sequences may block transport of the mRNA 
from the nucleus to the cytoplasm, thereby limiting the 
amount of mRNA available for translation. Another 
mechanism through which antisense sequences may 
inhibit gene expression is by interfering with mRNA '5 
splicing. In yet another strategy, the antisense nucleic 
acid may be incorporated in a ribozyme capable of spe- 
cifically cleaving the target mRNA. 

EXAMPLE 56 20 

Preparation and Use of Antisense Oligonucleotides 

[0459] The antisense nucleic acid molecules to be 
used in gene therapy may be either DNA or RNA se- 2S 
quences. They may comprise a sequence complemen- 
tary to the sequence of the EST-related nucleic ackJs. 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids. The antisense nucleic acids should have a length 30 
and melting temperature sufficient to permit fonmation 
of an intracellular duplex with sufficient stability to Inhibit 
the expression of the mRNA in the duplex. Strategies 
for designing antisense nucleic acids suitable for use in 
gene therapy are disclosed in Green et al, Ann. Rev. 3S 
Biochem, 55:569-597 (1986) and Izant and Weintraub, 
Ce// 36:1007-1015 (1984). 

[0460] In some strategies, antisense molecules are 
obtained from a nucleotide sequence encoding a protein 
by reversing the orientation of the coding region with re- 
spect to a promoter so as to transcribe the opposite 
strand from that which is normally transcribed in the cell. 
The antisense rTK>lecu(es may be transcribed using in 
wfro transcription systems such as those which employ 
T7 or SP6 polymerase to generate the transcript. An- 45 
other approach involves transcriptton of the antisense 
nucleic acids in vivo by operably linking DNA containing 
the antisense sequence to a prorTX}ter in an expression 
vector. 

[0461] Alternatively, ollgonucleotWes which are com- so 
plementary to the strand normally transcribed in the cell 
may be synthesized in wY/d. Thus, the antisense nucleic 
acids are complementary to the corresponding mRNA 
and are capable of hybrkJizing to the mRNA to create a 
duplex. In some embodiments, the antisense sequenc- S5 
es may contain modified sugar phosphate backbones 
to increase stability and make them less sensitive to 
RNase activity. Examples of modifrcations suitable for 



use in antisense strategies are described by Rossi et 
al„ Phannacol. Ther. 50(2):245-254, (1991). 
[0462] Various types of antisense oligonucleotides 
complementary to the sequence of the EST-related nu- 
cleic acids, positional segments of EST-related nucleic 
acids or fragments of positional segments of EST-relat- 
ed nucleic acids may be used. In one preferred embod- 
iment, stable and semi-stable antisense oligonucle- 
otides described in Intemattonal Application No. PCT 
W094/23026 are used. In these nrxjlecules, the 3' end 
or both the 3' and 5' ends are engaged In intramolecular 
hydrogen bonding between complementary base pairs. 
These molecules are better able to withstand exonucle- 
ase attacks and exhibit increased stability compared to 
conventk)nal antisense oligonucleotides. 
[0463] In another preferred embodiment, the anti- 
sense ollgodeoxynucleotides against herpes simplex vi- 
rus types 1 and 2 described in Intematranal Applicatton 
No. WO 95/04141 are used. 

[0464] In yet another preferred embodiment, the cov- 
alently cross-linked antisense oligonucleotides de- 
scribed in Intematbnal Applk:atk)n No. WO 96/31523 
are used. These double- or single-stranded oligonucle- 
otides comprise one or more, respectively, inter- or intra- 
oligonucleotide covalent cross-linkages, wherein the 
linkage consists of an amide bond between a primary 
amine group of one strand and a carboxyl group of the 
other strand or of the same strand, respectively, the pri- 
mary amine group being directly substituted In the 2' po- 
sitbn of the strand nucleotide monosaccharide ring, and 
the carboxyl group being carried by an aliphatic spacer 
group substituted on a nucleotide or nucleotide anabg 
of the other strand or the same strand, respectively. 
[0465] The antisense ollgodeoxynucleotides and oli- 
gonucleotides diseased in International Application No. 
WO 92/18522 may also be used. These molecules are 
stable to degradation and contain at least one transcrip- 
tion control recognition sequence whrch binds to control 
proteins and are effective as decoys therefor. These 
molecules rr^y contain 'hairpin' structures, "dumbbell" 
structures, 'modified dumbbell' structures, "cross- 
linked" decoy structures and "loop" structures. 
[0466] In another preferred embodiment, the cyclic 
double-stranded oligonucleotides described in Europe- 
an Patent Applicatbn No. 0 572 287 A2. These llgated 
oligonucleotide "dumbbells" contain the binding site for 
a transcription factor and inhibit expression of the gene 
under control of the transcription factor by sequestering 
the factor. 

[0467] Use of the closed antisense oligonucleotides 
disclosed in International Application No. WO 92/1 9732 
is also contemplated. Because these nnolecules have 
no free ends, they are more resistant to degradation by 
exonucleases than are conventional oligonucleotides. 
These oligonucleotides may be multifunctional, Interact- 
ing with several regions which are not adjacent to the 
target mRNA. 

[0468] The appropriate level of antisense nucleic ac- 
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ids required to inhibit gene expression nnay be deter- 
mined using in vitro expression analysis. The antisense 
molecule may be introduced into the cells by diffusion, 
injection, infection or transfection using procedures 
known in the art. For example, the antisense nucleic ac- 
ids can be introduced into the body as a bare or naked 
oligonucleotide, oligonucleotide encapsulated in lipid, 
oligonucleotide sequence encapsidated by viral protein, 
or as an oligonucleotide operably linked to a promoter 
contained in an expression vector The expression vec- 
tor may be any of a variety of expression vectors known 
in the art, including retroviral or viral vectors, vectors ca- 
pable of extrachromosomal replication, or integrating 
vectors. The vectors may be DNA or RNA. 
[0469] The antisense molecules are introduced onto 
cell samples at a number of different concentrations 
preferably between IxlO-^o^to IxlO-^M. Once the min- 
imum concentration that can adequately control gene 
expression is identified, the optimized dose is translated 
into a dosage suitable for use in vivo. For example, an 
inhibiting concentration in culture of IxlO"^ translates in- 
to a dose of approxinr^tely 0.6 mg/kg bodyweight. Lev- 
els of oligonucleotide approaching 100 mg/kg body- 
weight or higher maybe possible after testing the toxicity 
of the oligonucleotide in laboratory animals. It is addi- 
tionally contemplated that cells from the vertebrate are 
removed, treated with the antisense oligonucleotkie, 
and reintroduced into the vertebrate. 
[0470] It is further contemplated that the antisense ol- 
igonucleotide sequence is incorporated Into a ribozyme 
sequence to enable the antisense to specifically bind 
and cleave its target mRNA. For technical applications 
of ribozyme and antisense oligonucleotides see Rossi 
0t air supra. 

[0471] In a preferred applicatran of this invention, the 
polypeptide encoded by the gene is first identified, so 
that the effectiveness of antisense inhibition on transla- 
tion can be monitored using techniques that include but 
are not limited to antibody-mediated tests such as Rl As 
and ELISA, functional assays, or radiolabeling. 
[0472] The EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may also 
be used in gene therapy approaches based on intracel- 
lular triple helix fornnation. Triple helix oligonucleotides 
are used to inhibit transcription from a genome. They 
are particularly useful for studying alterations in cell ac- 
tivity as it is associated with a particular gene. The EST- 
related nucleic acids, positonal segments of EST-relat- 
ed nucleic acids or fragments of positional segments of 
EST-related nucleic acids of the present invention or, 
more preferably, a portion of those sequences, can be 
used to inhibit gene expression in individuals having dis- 
eases associated with expression of a particular gene. 
Similarly, the EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nuciek; acids can be 
used to study the effect of inhibiting transcription of a 



particular gene within a cell. Traditranally, homopurine 
sequences were considered the most useful for triple 
helix strategies. However, homopyrimidine sequences 
can also inhibit gene expression. Such homopyrimidine 

5 oligonucleotides bind to the major groove at homopu- 
rine:honrK)pyrimidine sequences. Thus, both types of 
sequences from the EST-related nucleic acids, position- 
al segments of EST-related nucleic acids or fragments 
of positional segments of EST-related nuciek: acids are 

10 contemplated within the scope of this invention. 

EXAMPLE 57 

Preparation and use of Triple Helix Probes 

75 

[0473] The sequences of the EST-related nucleic ac- 
kte, positbnal segments of EST-related nucleic ackJs or 
fragments of positional segments of EST-related nucleic 
acids are scanned to identify 1 0-mer to 20-mer homopy- 
rimidine or homopurine stretches which could be used 
in triple-helix based strategies for inhibiting gene ex- 
presskvi. Following identification of candidate homopy- 
rimidine or homopurine stretches, their efficiency in in- 
hibiting gene expression is assessed by introducing var- 
ying amounts of oligonucleotides containing the candi- 
date sequences into tissue culture cells whrch nornnally 
express the target gene. The oligonucleotides may be 
prepared on an oligonucleotide synthesizer or they may 
be purchased commercially from a company specializ- 
ing in custom oligonucleotide synthesis, such as 
GENSET, Paris, France. 

[0474] The oligonucleotides may be introduced into 
the cells using a variety of methods known to those 
skilled in the art, including but not limited to calcium 
phosphate precipitatkxi, DEAE-Dextran, electropora- 
tion, liposome-mediated transfection or native uptake. 
[0475] Treated cells are monitored for altered cell 
function or reduced gene expressk)n using techniques 
such as Northern blotting, RNase protection assays, or 
PGR based strategies to monitor the transcription levels 
of the target gene in cells which have been treated with 
the oligonucleotide. The cell functk)ns to be nrxxiitored 
are predicted based upon the homok>gies of the target 
genes corresponding to the EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids from which the oligonucleotide were derived with 
known gene sequences that have been associated with 
a particular function. The cell functions can also be pre- 
dicted based on the presence of abnormal physiologies 
within cells derived from individuals with a particular in- 
herited disease, particularly when the EST-related nu- 
cleic ackJs. positional segments of EST-related nucleic 
ackjs or fragments of positkxial segments of EST-relat- 
ed nucleic acids are associated with the disease using 
technk^ues described herein. 
[0476] The oligonucleotides which are effective in in- 
hibiting gene expression in tissue culture cells may then 
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be intrcxJuced in wVo using the techniques described 
above and in Example 56 at a dosage calculated based 
on the in vitro results, as described in Exannple 56. 
[0477] In Sonne embodiments, the natural (beta) ano- 
mers of the oligonucleotide units can be replaced with 
alpha anomers to render the oligonucleotide more re- 
sistant to nucleases. Further, an intercalating agent 
such as ethidium bromide, or the like, can be attached 
to the 3' end of the alpha oligonucleotide to stabilize the 
triple helix. For information on the generation of oligo- 
nucleotides suitable for triple helix formation see Griffin 
0taL {Science 24S:967'97^ (1989)). 

EXAMPLE 58 

Use of EST-related nucleic acids, positional segments 
of EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids to express an 
Encoded Protein in a Host Organism 

[0478] The EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may also 
be used to express an encoded protein or polypeptide 
in a host organism to produce a beneficial effect. In ad- 
dition, nucleic acids encoding the EST-related polypep- 
tides, positional segments of EST-related polypeptides 
or fragments of positional segments of EST-related 
polypeptides may be used to express the encoded pro- 
tein or polypeptide in a host organism to produce a ben- 
eficial effect. 

[0479] In such procedures, the encoded protein or 
polypeptide may be transiently expressed in the host or- 
ganism or stably expressed in the host organism. The 
encoded protein or polypeptide may have any of the ac- 
tivities described above. The encoded protein or 
polypeptide may be a protein or polypeptide which the 
host organism lacks or, alternatively, the encoded pro- 
tein may augment the existing levels of the protein in the 
host organism. 

[0480] In some embodiments in which the protein or 
potypeptWe is secreted, nucleic acids encoding the full 
length protein (i.e. the signal peptide and the mature 
protein), or nuciek; ackis encoding only the mature pro- 
tein (i.e. the protein generated when the signal peptide 
is cleaved off) Is Introduced into the host organism. 
[0481] The nuciek: ackjs encoding the proteins or 
polypeptkjes may be introduced into the host organism 
using a variety of technkiues known to those of skill in 
the art. For exarnple, the extended cDNA may be inject- 
ed into the host organism as naked DNA such that the 
encoded protein is expressed in the host organism, 
thereby producing a beneficial effect. 
[0482] Alternatively, the nuciek: acids encoding the 
protein or polypeptide may be cloned into an expression 
vector downstream of a promoter which is active in the 
host organism. The expression vector nrtay be any of the 
expression vectors designed for use in gene therapy, 



including viral or retroviral vectors. The expression vec- 
tor may be directly introduced into the host organism 
such that the encoded protein is expressed in the host 
organism to produce a beneficial effect. In another ap- 
5 proach, the expression vector may be introduced into 
cells in vitro. Cells containing the expression vector are 
thereafter selected and introduced into the host organ- 
ism, where they express the encoded protein or 
polypeptide to produce a beneficial effect. 

10 

EXAMPLE 59 

Use of Signal Peptides To Import Proteins Into Cells 

^5 [0483] The short core hydrophobic region (h) of signal 
peptkies encoded by the sequences of SEQ ID NOs: 
24-652 and 3721 -381 1 may also be used as a carrier to 
import a peptkJe or a protein of interest, so-called cargo, 
into tissue culture cells (Lin et aL, J. Biol. Chem., 270: 

20 14225-14258 (1995); Du et al, J. Peptide Res., 51; 
235-243 (1998); Rojas et ai, Nature Biotech,, 16: 
370-375(1998)). 

[0484] When cell permeable peptides of limited size 
(approximately up to 25 amino acids) are to be transb- 

25 cated across ceil membrane, chemical synthesis rr^y 
be used in order to add the h regton to either the C-ter- 
minus or the N-terminus to the cargo peptide of interest. 
Alternatively, when longer peptides or proteins are to be 
imported into cells, nuciek: ackis can be genetically en- 

50 gineered, using technk^ues familiar to those skilled in 
the art, in order to link the extended cONA sequence 
encoding the h region to the 5' or the 3' end of a DNA 
sequence coding for a cargo polypeptide. Such geneti- 
cally engineered nuciek: acids are then translated either 

^ in vitro or in v/vo after transfectlon into appropriate cells, 
using conventional techniques to produce the resulting 
cell pemneable polypeptide. Suitable hosts cells are 
then simply incubated with the cell permeable polypep- 
tide which is then translocated across the membrane. 

^ [0485] This method may be applied to study diverse 
intracellular functkxis and cellular processes. For in- 
stance, it has been used to probe functbnally relevant 
domains of intracellular proteins and to examine protein- 
protein interactions involved in signal transductton path- 

^ ways (Lin et al. supra; Lin et aL, J. Biol, Chem., 271: 
5305-5308 (1996); Rojas et aL, J, Biol. Chem., 271: 
27456-27461 (1996); Liu et al., Proc. Natl, Acad. Sci. 
USA, 93: 11819-11824 (1996); Rojas etal., Bioch. Bio- 
phys. Res. Commun., 234: 675-680 (1997)). 

50 [0486] Such techniques n\ay be used in cellular ther- 
apy to import proteins producing therapeutic effects. For 
instance, cells isolated from a patient may be treated 
with imported therapeutic proteins and then re-intro- 
duced into the host organism. 

ss [0487] Alternatively, the h regk)n of signal peptkJes of 
the present inventbn could be used in combination with 
a nuclear localization signal to deliver nucleic ackis into 
cell nucleus. Such oligonucleotkjes may be antisense 



57 



113 



EP 1 033 401 A2 



114 



oligonucleotides or oligonucleotides designed to form 
triple helixes, as describedabove, in order to inhibit 
processing and nnaturation of a target cellular RNA. 

EXAMPLE 60 5 

Computer Embodiments 

[0488] As used herein the term "nucleic acid codes of 
SEQ ID NOs: 24-4100 and 8178-36681" encompasses io 
the nucleotide sequences of SEQ ID NOs: 24-41 00 and 
8178-36681. fragments of SEQ ID NOs: 24-4100 and 
8178-36681, nucleotide sequences homologous to 
SEQ ID NOs: 24-4100 and 8178-36681 or homologous 
to fragments of SEQ ID NOs: 24-41 00 and 8178-36681 , 
and sequences complementary to all of the preceding 
sequences. The fragments include portions of SEQ ID 
NOs: 24-4100 and 8178-36681 comprising at least 10, 
15, 20, 25, 30, 35, 40. 50. 75. 100, 150. 200, 300, 400, 
or 500 consecutive nucleotides of SEQ ID NOs: 24-41 00 20 
and 8178-36681. Preferably, the fragments are novel 
fragments. Homologous sequences and fragments of 
SEQ ID NOs: 24-4100 and 8178-36681 refer to a se- 
quence having at least 99%, 98%, 97%. 96%, 95%, 
90%, 85%, 80%, or 75% honrK5logy to these sequences. 2S 
Homology may be detemnined using any of the compu- 
ter programs and parameters described in Example 18, 
including BLAST2N with the default parameters or with 
any modified parameters. Homologous sequences also 
include RNA sequences in which uridines replace the 30 
thymines in the nucleic acid codes of SEQ ID NOs: 
24-4100 and 8178-36681. The horrralogous sequences 
may be obtained using any of the procedures described 
herein or may result from the correction of a sequencing 
error as described above. It will be appreciated that the 3S 
nucleic acid codes of SEQ ID NOs: 24-4100 and 
8178-36681 can be represented in the traditional single 
character format (See the inside back cover of Starrier. 
Lubert. Biochemistry, 3«* edition. W. H Freeman & Co., 
New York.) or in any other format which records the iden- 40 
tity of the nucleotides in a sequence. 
[0489] As used herein the term "polypeptide codes of 
SEQ ID NOs: 4101-8177* encompasses the polypep- 
tide sequence of SEQ ID NOs: 4101 -8177 whch are en- 
coded by the 5' EST s of SEQ ID NOs: 24-4100 and 45 
8 1 78-36681 , polypeptkJe sequences homologous to the 
polypeptkles of SEQ ID NOs: 4101-8177, or fragments 
of any of the preceding sequences. Homotogous 
polypeptkie sequences refer to a polypeptide sequence 
having at least 99%, 98%, 97%, 96%. 95%, 90%, 85%. so 
80%, 75% honnology to one of the polypeptide sequenc- 
es of SEQ ID NOs: 4101-8177. Homology nnay be de- 
termined using any of the computer programs and pa- 
rameters described herein, including FASTA with the 
default parameters or with any modified parameters. 55 
The honrx^logous sequences may be obtained using any 
of the procedures described herein or may result from 
the correct bn of a sequencing error as described above. 



The polypeptide fragments comprise at least 5, 10, 15, 
20, 25, 30, 35, 40, 50, 75, 1 00. or 1 50 consecutive amino 
acids of the polypeptides of SEQ ID NOs: 4101-8177. 
Preferably, the fragments are novel fragments. It will be 
appreciated that the polypeptide codes of the SEQ ID 
NOs: 4101-8177 can be represented in the traditional 
single character format or three letter format (See the 
inside back cover of Starrier, Lubert. Biochemistry, 3'^ 
edition. W. H Freeman & Co., New York.) or in any other 
format whch relates the identity of the polypeptides in 
a sequence. 

[0490] It will be appreciated by those skilled in the art 
that the nucleic acid codes of SEQ ID NOs: 24-41 00 and 
8178-36681 and polypeptkJe codes of SEQ ID NOs: 
4101-8177 can be stored, recorded, and manipulated 
on any medium which can be read and accessed by a 
computer. As used herein, the words "recorded" and 
"stored" refer to a process for storing informatkjn on a 
computer medium. A skilled artisan can readily adopt 
any of the presently known methods for recording infor- 
mation on a computer readable medium to generate 
manufactures comprising one or more of the nucleic ac- 
kJ codes of SEQ ID NOs: 24-4100 and 8178-36681 . one 
or nnore of the polypeptide codes of SEQ ID .NOs: 
4101-8177. Another aspect of the present invention is 
a computer readable medium having recorded thereon 
at least 2, 5. 10. 15. 20. 25. 30, or 50 nucleic acid codes 
of SEQ ID NOs: 24-4100 and 8178-36681 . Another as- 
pect of the present inventton is a computer readable me- 
dium having recorded thereon at least 2, 5. 10, 15, 20, 
25. 30, or 50 polypeptkie codes of SEQ ID NOs: 
4101-8177. 

[0491] Computer readable media include nr^gnetical- 
ly readable media, optically readable media, electroni- 
cally readable media and magnetic/optk:al media. For 
example, the computer readable media may be a hard 
disc, a floppy disc, a magnetic tape, CD-ROM, DVD. 
RAM, or ROM as well as other types of other media 
known to those skilled in the art. 
[0492] Embodiments of the present invention include 
systems, particularly computer systems which contain 
the sequence informatbn described herein. As used 
herein, "a computer system" refers to the hardware 
components, software components, and data storage 
components used to analyze the nucleotkie sequences 
of the nucleic acid codes of SEQ ID NOs: 24-4100 and 
8178-36681, or the amino acid sequences of the 
polypeptide codes of SEQ ID NOs: 4101-8177. The 
computer system preferably includes the computer 
readable media described above, and a processor for 
accessing and manipulating the sequence data. 
[0493] Preferably, the computer is a general purpose 
system that comprises a central processing unit (CPU), 
one or more data storage components for storing data, 
and one or more data retrieving devices for retrieving 
the data stored on the data storage components. A 
skilled artisan can readily appreciate that any one of the 
currently available computer systems are suitable. 
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[0494] In one particular embodiment, the computer 
system includes a processor connected to a bus which 
is connected to a main memory (preferably implement- 
ed as RAM) and one or more data storage devices, such 
as a hard drive and/or other computer readable media 
having data recorded thereon. In some embodiments, 
the computer system further includes one or more data 
retrieving devices for reading the data stored on the data 
storage components. The data retrieving device may 
represent, for example, a floppy disk drive, a compact 
disk drive, a magnetic tape drive, etc. In some embodi- 
ments, the data storage component is a removable com- 
puter readable medium such as a floppy disk, a compact 
disk, a magnetic tape, etc. containing control logic and/ 
or data recorded thereon. The computer system may 
advantageously include or be programmed by appropri- 
ate software for reading the control logk: and/or the data 
from the data storage component once inserted in the 
data retrieving device. Software for accessing and 
processing the nucleotkie sequences of the nucleic acid 
codes of SEQ ID NOs: 24-41 00 and 81 78-36681 , or the 
amino acid sequences of the polypeptide codes of SEQ 
ID NOs: 4101-8177 (such as search tools, compare 
tools, and modeling tools etc.) may reside in main mem- 
ory during execution. 

[0495] In some embodiments, the computer system 
may further comprise a sequence comparer for compar- 
ing the above<f escribed nucleic acid codes of SEQ ID 
NOs: 24-4100 and 8178-36681 or polypeptide codes of 
SEQ I D NOs: 41 01 -81 77 stored on a computer readable 
medium to reference nucleotide or polypeptde se- 
quences stored on a computer readable medium. A "se- 
quence comparer' refers to one or more programs 
which are implemented on the computer system to com- 
pare a nucleotide or polypeptide sequence with other 
nucleotide or polypeptkie sequences and/or com- 
pounds including but not limited to peptides, peptidomi- 
metics, and chemicals stored within the data storage 
means. For example, the sequence comparer may com- 
pare the nucleotide sequences of the nucleic acid codes 
of SEQ ID NOs: 24-4100 and 8178-36681 . or the amino 
acid sequences of the polypeptide codes of SEQ ID 
NOs: 4101-8177 stored on a computer readable medi- 
um to reference sequences stored on a computer read- 
able medium to Identify homologies, motifs implicated 
in biological function, or structural motifs. The various 
sequence comparer programs identified elsewhere in 
this patent specificatkxi are particularly contemplated 
for use in this aspect of the invention. 
[0496] Accordingly, one aspect of the present inven- 
tion is a computer system comprising a, processor, a da- 
ta storage device having stored thereon a nucleic acid 
code of SEQ ID NOs: 24-4100 and 8178-36681 or a 
polypeptide code of SEQ ID NOs: 4101-8177, a data 
storage device having retrievably stored thereon refer- 
ence nucleotide sequences or polypeptide sequences 
to be compared to the nucleic acid code of SEQ ID NOs: 
24-4100 and 8178-36681 or polypeptide code of SEQ 



ID NOs: 4101-8177 and a sequence comparer for con- 
ducting the comparison. The sequence comparer may 
indicate a homology level between the sequences com- 
pared or identify structural motifs in the above described 

5 nucleic acid code of SEQ ID NOs: 24-4100 and 
8178-36681 and polypeptide codes of SEQ ID NOs: 
4101-8177 or it may identify structural motifs in se- 
quences which are compared to these nucleic acid 
codes and polypeptide codes. In some embodiments, 

'0 the data storage device may have stored thereon the 
sequences of at least 2, 5, 10, 15, 20, 25, 30, or 50 of 
the nucleic acid codes of SEQ ID NOs: 24-4100 and 
8178-36681 or polypeptide codes of SEQ ID NOs: 
4101-8177. 

IS [0497] Another aspect of the present inventkjn is a 
method for determining the level of homology between 
a nucleic acid code of SEQ ID NOs: 24-4100 and 
8178-36681 and a reference, nucleotide sequence, 
comprising the steps of reading the nucleic acK) code 

20 and the reference nucleotide sequence through the use 
of a computer program which determines homology lev- 
els and determining honrtology between the nucleic acid 
code and the reference nucleotide sequence with the 
computer program. The computer program may be any 

25 of a number of computer programs for determining ho- 
mology levels, including those specifically enumerated 
herein, including BI^ST2N with the default parameters 
or with any modified parameters. The method may be 
implemented using the computer systems described 

^ above. The method may also be performed by reading 
2, 5, 10, 15, 20, 25. 30. or 50 of the above described 
nucleic acid codes of SEQ ID NOs: 24-4100 and 
8178-36681 through use of the computer program and 
determining homology between the nucleic acid codes 

^ and reference nucleotide sequences . 

[0498] Alternatively, the computer program may be a 
computer program which compares the nucleotide se- 
quences of the nucleic acid codes of the present Inven- 
tion, to reference nucleotide sequences in order to de- 

40 termine whether the nuciek; acid code of SEQ ID NOs: 
24-4100 and 8178-36681 differs from a reference nu- 
cleic acid sequence at one or more positions. Optk>nally 
such a program records the length and identity of Insert- 
ed, deleted or substituted nucleotides with respect to the 

^ sequence of either the reference polynucleotide or the 
nucleic acid code of SEQ ID NOs: 24-4100 and 
8178-36681. In one embodiment, the computer pro- 
gram may be a program whk:h determines whether the 
nucleotide sequences of the nuciek: acki codes of SEQ 

50 ID NOs: 24-4100 and 8178-36681 contain a single nu- 
cleotide polymorphism (SNP) with respect to a refer- 
ence nucleotide sequence. This single nucleotide poly- 
morphism may comprise a single base substitution, in- 
sertion, or deletion. 

55 [0499] Another aspect of the present inventton is a 
method for determining the level of homology between 
a polypeptide code of SEQ ID NOs: 4101-8177 and a 
reference polypeptkie sequence, comprising the steps 
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of reading the polypeptide code of SEQ ID NOs: 
4101-8177 and the reference polypeptide sequence 
through use of a computer program, which determines 
homology levels and determining homology between 
the polypeptide code and the reference polypeptide se- 
quence using the computer program. 
[0500] Accordingly, another aspect of the present in- 
vention is a method for determining whether a nucleic 
acid code of SEQ ID NOs: 24-41 00 and 8 178-36681 dif- 
fers at one or more nucleotides from a reference nucle- 
otide sequence comprising the steps of reading the nu- 
cleic acid code and the reference nucleotide sequence 
through use of a computer program which identifies dif- 
ferences between nucleic acid sequences and identify- 
ing differences between the nucleic acid code and the 
reference nucleotide sequence with the computer pro- 
gram. In some embodiments, the computer program is 
a program which identifies single nucleotide polymor- 
phisms. The method may be implemented by the com- 
puter systems described above. The method may also 
be performed by reading at least 2, 5, 1 0, 1 5, 20, 25, 30, 
or 50 of the nucleic acid codes of SEQ ID NOs: 24-41 00 
and 8178-36681 and the reference nucleotide sequenc- 
es through the use of the computer program and iden- 
tifying differences between the nucleic acid codes and 
the reference nucleotide sequences with the computer 
program. 

[0501] In other embodiments the computer based 
system may further comprise an identifier for identifying 
features within the nucleotide sequences of the nucleic 
acid codes of SEQ ID NOs: 24-4100 and 8178-36681 
or the amino acid sequences of the polypeptide codes 
of SEQIDNOs: 4101-8177. 

[0502] An ■identifier" refers to one or more programs 
which identifies certain features within the above-de- 
scribed nucleotide sequences of the nucleic acid codes 
of SEQ ID NOs: 24-4100 and 8178-36681 or the amino 
acid sequences of the polypeptide codes of SEQ ID 
NOs: 4101-8177. In one embodiment, the identifier may 
comprise a program which identifies an open reading 
frame in the cDNAs codes of SEQ ID NOs: 24-4100 and 
8178-36681. 

[0503] In another embodiment, the identifier may 
comprise a molecular modeling program which deter- 
mines the 3-dimensional structure of the polypeptides 
codes of SEQ ID NOs: 4101-8177. In some embodi- 
ments, the molecular nrxxieling program identifies target 
sequences that are most compatible with profiles repre- 
senting the stnjcturat environments of the residues in 
known three-dimensional protein structures. (See, e.g., 
Eisenberg et al,. U.S. Patent No. 5,436,850 issued July 
25, 1995). In another technique, the known three<!i- 
mensional structures of proteins in a given family are 
superimposed to define the structurally conserved re- 
gions in that family. This protein modeling technique al- 
so uses the known three-dimensional structure of a ho- 
mobgous protein to approximate the structure of the 
polypeptrcie codes of SEQ ID NOs: 4101-81 77. (See e. 



g., Srinivasan, et al.. U.S. Patent No. 5,557,535 issued 
September 17, 1996). Conventional homology mode- 
ling techniques have been used routinely to build mod- 
els of proteases and antibodies, (Sowdhamini et al., 
5 Protein Engineering 10:207, 215 (1997)). Comparative 
approaches can also be used to develop three-dimen- 
sksnal protein models when the protein of interest has 
poor sequence identity to template proteins. In some 
cases, proteins fold into similar three-dimensional struc- 
10 tures despite having very weak sequence identities. For 
example, the three-dimenskxial structures of a number 
of helical cytokines fold in similar three-dimensional to- 
pology in spite of weak sequence homology. 
[0504] The recent development of threading methods 
IS now enables the identification of likely folding patterns 
in a number of situations where the structural related- 
ness between target and template(s) is not detectable 
at the sequence level. Hybrid methods, in which fold rec- 
ognition is performed using Multiple Sequence Thread- 
ing (MST), structural equivalencies are deduced from 
the threading output using a distance geometry program 
DRAGON to construct a low resolution model, and a full- 
atom representatran is constructed using a molecular 
modeling package such as QUANTA. 
[0505] According to this 3-step approach, candidate 
templates are first identified by using the novel fokj rec- 
ognition algorithm MST. which is capable of performing 
simultaneous threading of multiple aligned sequences 
onto one or more 3-D structures. In a second step, the 
structural equivalencies obtained from the MST output 
are converted into interresidue distance restraints and 
fed into the distance geometry program DRAGON, to- 
gether with auxiliary information obtained from second- 
ary structure predk;tk)ns. The program combines the re- 
straints in an unbiased manner and rapidly generates a 
large number of low resolutk>n model confirmatrans. In 
a third step, these tow resolution model confirmations 
are converted into full-atom models and subjected to en- 
ergy minimization using the molecular modeling pack- 
age QUANTA. (See e.g., Asz6di et al., Proteins:Struc- 
lure, Functton, and Genetics, Supplement 1:38-42 
(1997)). 

[0506] The results of the molecular modeling analysis 
may then be used in rational drug design technques to 
identify agents whk:h modulate the activity of the 
polypeptide codes of SEQ ID NOs: 4101-8177. 
[0507] Accordingly, another aspect of the present in- 
ventton is a method of identifying a feature within the 
nucleic acid codes of SEQ ID NOs: 24-4100 and 
8178-36681 or the polypeptide codes of SEQ ID NOs: 
4101 -8177 comprising reading the nucleic acid code(s) 
or the polypeptide code(s) through the use of a compu- 
ter program which kJentifies features therein and iden- 
tifying features within the nuciek: acid code(s) or 
polypeptide code(s) with the computer program. In one 
embodiment, computer program comprises a computer 
program which identifies open reading frames. In a fur- 
ther embodiment, the computer program identifies 



2S 



30 



40 



45 



SO 



60 



119 



EP 1 033 401 A2 



120 



structural motifs in a polypeptide sequence. In another 
embodiment, the computer program comprises a mo- 
lecular modeling program. The method may be per- 
formed by reading a single sequence or at least 2, 5, 1 0, 
15. 20, 25. 30. or 50 of the nucleic acid codes of SEQ S 
ID NOs: 24-4100 and 8178-36681 or the polypeptide 
codes of SEQ ID NOs: 4101-8177 through the use of 
the computer program and identifying features within 
the nucleic acid codes or polypeptide codes with the 
computer program. io 
[0508] The nucleic acid codes of SEQ ID NOs: 
24-4100 and 8178-36681 or the polypeptide codes of 
SEQ ID NOs: 4101-8177 may be stored and manipulat- 
ed in a variety of data processor programs in a variety 
of formats. For example, the nucleic acid codes of SEQ ^5 
ID NOs: 24-4100 and 8178-36681 or the polypeptide 
codes of SEQ I D NOs: 41 01 -81 77 may be stored as text 
in a word processing file, such as MicrosoftWORD or 
WORDPERFECT or as an ASCII file in a variety of da- 
tabase programs familiar to those of skill in the art, such 20 
as DB2. SYBASE, or ORACLE. In addition, many com- 
puter programs and databases nr^y be used as se- 
quence comparers, identifiers, or sources of reference 
nucleotide or polypeptide sequences to be compared to 
the nucleic acid codes of SEQ ID NOs: 24-4100 and 2S 
8178-36681 or the polypeptide codes of SEQ ID NOs: 
4101-81 77. The following list is intended not to limit the 
invention but to provide guidance to programs and da- 
tabases which are useful with the nucleic acid codes of 
SEQ ID NOs: 24-4100 and 8178-36681 or the polypep- 30 
tide codes of SEQ ID NOs: 4101-8177. The programs 
and databases which may be used include, but are not 
limited to: MacPattern (EMBL). DiscoveryBase (Molec- 
ular Applications Group), GeneMine (Molecular Appli- 
cations Group). Look (Molecular Applications Group), 35 
MacLook (Molecular Applk:ations Group). BLAST and 
BLAST2 (NCBI), BLASTN and BLASTX (Altschul et al, 
J. MoL Biol. 215: 403 (1990)), PASTA (Pearson and Lip- 
man, Proc. Natl. Acad. ScL USA, 85: 2444 (1988)), 
FASTDB (Brutlag et al. Comp. App. Biosci. 6:237-245, 40 
1990), Catalyst (Molecular Simulations Inc.), Catalyst/ 
SHAPE (Molecular Simulations Inc.), Cerius^.DBAc- 
cess (Molecular Simulations Inc.), HypoGen (Molecular 
Simulations Inc.), Insight II, (Molecular Simulations 
Inc.), Discoveir (Molecular Simulations Inc.). CHARMm 4S 
(Molecular Simulations Inc.), Felix (Molecular Simula- 
tions Inc.), DelPhi, (Molecular Simulations Inc.), 
QuanteMM, (Molecular Simulatksns Inc.), Homology 
(Molecular Simulations Inc.), Modeler (Molecular Simu- 
lations Inc.), ISIS (Molecular Simulations Inc.), Quanta/ so 
Protein Design (Molecular Simulations Inc.), WebLab 
(Molecular Simulations Inc.), WebLab Diversity Explor- 
er (Molecular Simulations Inc.), Gene Explorer (Molec- 
ular Simulations Inc.), SeqFold (Molecular Simulations 
Inc.), the EMBUSwissprotein database, the MDL Avail- 55 
able Chemicals Directory database, the MDL Drug Data 
Report data base, the Comprehensive Medicinal Chem- 
istry database, Derwents's World Drug Index database. 



the BioByteMasterFile database, the Genbank data- 
base, and the Genseqn database. Many other programs 
and data bases would be apparent to one of skill in the 
art given the present disclosure. 
[0509] Motifs which may be detected using the above 
programs include sequences encoding leucine zippers, 
helix-turn-helix motifs, gtycosylation sites, ubiquitinatbn 
sites, alpha helices, and beta sheets, signal sequences 
encoding signal peptides which direct the secretion of 
the encoded proteins, sequences implicated in tran- 
scription regulation such as homeoboxes, acidic 
stretches, enzymatic active sites, substrate binding 
sites, and enzymatic cleavage sites. 

EXAMPLE 61 

Methods of Making Nuciek; Acids 

[051 0] The present inventran also comprises methods 
of making the EST-related nucleic acids, fragments of 
EST-related nucleic acids, positional segments of the 
EST-related nucleic acids, or fragments of positional 
segments of the EST-related nucleic acids. The meth- 
ods comprise sequentially linking together nucleotides 
to produce the nucleic acids having the preceding se- 
quences. A variety of methods of synthesizing nucleic 
ackte are known to those skilled in the art. 
[0511] In many of these methods, synthesis is con- 
ducted on a solid support. These included the 3' phos- 
phoramidite methods in which the 3' terminal base of 
the desired oligonucleotide is immobilized on an insol- 
uble carrier The nucleotide base to be added is blocked 
at the 5* hydroxyl and activated at the 3* hydroxyl so as 
to cause coupling with the immobilized nucleotkie base. 
Deblocking of the new immobilized nucleotide com- 
pound and repetition of the cycle will produce the de- 
sired polynucleotide. Alternatively, polynucleotides may 
be prepared as described in U.S. Patent No. 5,049,656. 
In some embodiments, several polynucleotides pre*- 
pared as described above are ligated together to gen- 
erate bnger potynucieotkies having a desired se- 
quence. 

EXAMPLE 62 

Methods of Makino Polypeptides 

[051 2] The present invention also comprises methods 
of making the polynucleotkdes encoded by EST-related 
nucleic acids, fragments of EST-related nucleic acids, 
positional segments of the EST-related nucleic acids, or 
fragments of positional segments of the EST-related nu- 
cleic acids and methods of making the EST-related 
polypeptides, fragments of EST-related polypeptides, 
positional segments* of EST-related polypeptides, or 
fragments of EST-related polypeptides. The methods 
comprise sequentially linking together amino acids to 
produce the nucleic polypeptides having the preceding 



61 



121 



EP 1 033 401 A2 



122 



sequences. In some embodiments, the polypeptides 
made by these methods are 150 amino acid or less in 
length. In other embodiments, the polypeptides made 
by these methods are 120 amino acids or less in length. 
[0513] A variety of methods of making polypeptides s 
are known to those skilled in the art. including methods 
in which the carboxyl terminal amino acid is bound to 
polyvinyl benzene or another suitable resin. The amino 
acid to be added possesses blocking groups on its ami- 
no moiety and any side chain reactive groups so that io 
only its carboxyl moiety can react. The carboxyl group 
is activated with carbodiimide or another activating 
agent and allowed to couple to the immobilized amino 
acid. After removal of the bkxking group, the cycle is 
repeated to generate a polypeptide having the desired is 
sequence. Alternatively, the methods described in U.S. 
Patent No. 5,049,656 may be used. 
[0514] As discussed above, the EST-related nucleic 
acids, fragments of the EST-related nucleic acids, posi- 
tional segments of the EST-related nucleic acids, or 20 
fragments of positranal segments of the EST-related nu- 
cleic acids can be used for various purposes. The poly- 
nucleotides can be used to express recombinant protein 
for analysis, characterizatran or therapeutic use; pro- 
duction of secreted polypeptides or chimeric polypep- 2S 
tides, antibody production, as markers for tissues in 
which the corresponding protein is preferentially ex- 
pressed (either constitutively or at a particular stage of 
tissue differentiation or development or in disease 
states); as molecular weight markers on Southern gels; 30 
as chromosome markers or tags (when labeled) to iden- 
tify chromosomes or to map related gene positions; to 
compare with endogenous DNA sequences in patients 
to identify potential genetic disorders; as probes to hy- 
bridize and thus discover novel, related DNA sequenc- 3S 
es; as a source of information to derive PGR primers for 
genetic fingerprinting; for selecting and making oligom- 
ers for attachment to a "gene chip" or other support, in- 
cluding for examination for expression patterns; to raise 
anti-protein antibodies using DNA immunization tech- 40 
niques; and as an antigen to raise anti-DNA antibodies 
or elicit another immune response. Where the polynu- 
cleotide encodes a protein or polypeptide whrch binds 
or potentially binds to another protein or polypeptide 
(such as, for example, in a receptor-ligand interaction), 4S 
the polynucleotide can also be used in interactk)n trap 
assays (such as, for example, that described in Gyuris 
et aL, Ce// 75:791-803 (1993)) to Wentify polynucle- 
otides encoding the other protein or polypeptide with 
which binding occurs or to identify inhibitors of the bind- so 
ing interaction. 

[0515] The proteins or polypeptides provided by the 
present invention can similarly be used in assays to de- 
termine biotogical activity, including in a panel of multiple 
proteins for high-throughput screening; to raise antibod- ss 
ies or to elicit another immune response; as a reagent 
(including the labeled reagent) in assays designed to 
quantitatively determine levels of the protein (or its re- 



ceptor) in biological fluids; as markers for tissues in 
which the corresponding protein is preferentially ex- 
pressed (either constitutively or at a partrcular stage of 
tissue differentiation or development or in a disease 
state); and, of course, to isolate correlative receptors or 
ligands. Where the protein or polypeptide binds or po- 
tentially binds to another protein or polypeptkfe (such 
as. for example, in a receptor-ligand interactton), the 
protein can be used to identify the other protein with 
which binding occurs or to identify inhibitors of the bind- 
ing interaction. Proteins or polypeptides involved in 
these binding interactions can also be used to screen 
for peptide or small molecule inhibitors or agonists of 
the binding interaction. 

[051 6] Any or all of these research utilities are capable 
of being developed into reagent grade or kit format for 
commercialization as research products. 
[0517] Methods for pertomning the uses listed above 
are well known to those skilled in the art. References 
disclosing such methods include without limitatkxi "Mo- 
lecular Cloning; A t-aboratory Manual", 2d ed.. Cold 
Spring Harbor Laboratory Press, Sambrook. J., E.R 
Fritsch and T. Manialis eds., 1989, and "Methods in En- 
zyrrwlogy; Guide to Molecular Cloning Technques", Ac- 
ademic Press, Berger. S.L and A.R. Kimmel eds., 1987. 
[0518] Polynucleotides and proteins or polypeptides 
of the present invention can also be used as nutritkxial 
sources or supplements. Such uses include without lim- 
itation use as a protein or amino acid supplement, use 
as a carbon source, use as a nitrogen source and use 
as a source of cartDohydrate. In such cases the protein 
or polynucleotide of the invention can be added to the 
feed of a particular organism or can be administered as 
a separate solid or liquid preparation, such as in the form 
of powder, pills, solutions, suspensions or capsules. In 
the case of mteroorganisms, the protein or polynucle- 
otide of the invention can be added to the medium in or 
on which the microorganism is cultured. 
[0519] Although this invention has been described in 
terms of certain preferred embodiments, other embodi- 
ments which will be apparent to those of ordinary skill 
in the art in view of the disclosure herein are also within 
the scope of this inventon. Accordingly, the scope of the 
invention is intended to be defined only by reference to 
the appended claims. 



Claims 

1. A purified nucleic ackJ comprising a sequence se- 
lected from the group consisting of SEQ ID NOs: 
24-4100 and SEQ ID NOs: 8178-36681 and se- 
quences complementary to the sequences of SEQ 
ID NOs: 24-4100 and SEQ ID NOs: 8178-36681. 

2. A purified nucleic acid comprising at least 10 con- 
secutive nucleotides of a sequence selected from 
the group consisting of SEQ ID NOs: 24-4100 and 
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SEQ ID NOs; 8178-36681 and sequences comple- 
mentary to the sequences of SEQ ID NOs: 24-4100 
and SEQ ID NOs: 8178-36681 . 

3. A purified nucleic acid comprising at least 1 5 con- s 
secutive nucleotides of a sequence selected from 
the group consisting of SEQ ID NOs: 24-4100 and 
SEQ ID NOs: 8178-36681 and sequences comple- 
mentary to the sequences of SEQ ID NOs: 24-41 00 
and SEQ ID NOs: 8178-36681. io 

4. A purified nucleic acid comprising the coding se- 
quence of a sequence selected from the group con- 
sisting of SEQ ID NOs: 24-4100. 

IS 

5. A purified nucleic acid comprising the full coding se- 
quences of a sequence selected from the group 
consisting of SEQ ID NOs: 3721-3811 wherein the 
full coding sequence comprises the sequence en- 
coding the signal peptide and the sequence encod- 20 
ing the mature protein. 



13. A purified or isolated polypeptide comprising a se- 
quence selected from the group consisting of the 
sequences of SEQ ID NOs: 4101-8177. 

14. A purified or isolated polypeptide comprising a se- 
quence selected from the group consisting of SEQ 
ID NOs: 7798-7888. 

15. A purified or isolated polypeptide comprising a ma- 
ture protein of a polypeptide selected from the 
group consisting of SEQ ID NOs: 7798-7888. 

16. A purified or isolated polypeptide comprising a sig- 
nal peptide of a sequence selected from the group 
consisting of the polypeptides of SEQ ID NOs: 
4101 -4729 and 7798-7888. 

17. A purified or isolated polypeptide comprising at 
least 1 0 consecutive amino acids of a sequence se- 
lected from the group consisting of the sequences 
of SEQ ID NOs: 4101-8177. 



6. A purified nucleic acid comprising a contiguous 
span of a sequence selected from the group con- 
sisting of SEQ ID NOs: 3721-3811 which encodes 
the mature protein. 

7. A purified nucleic acid comprising a contiguous 
span of a sequence selected from the group con- 
sisting of SEQ ID NOs: 24-652 and 3721-3811 
which encode the signal peptide. 

8. A purified n ucleic acid encoding a polypeptide com- 
prising a sequence selected from the group consist- 
ing of the sequences of SEQ ID NOs: 4101 -8177. 

9. A purified nucleic acid encoding a polypeptide com- 
prising a sequence selected from the group consist- 
ing of the sequences of SEQ ID NOs: 7798-7888. 

1 0. A purified nucleic acid encoding a polypeptide com- 
prising a mature protein included in a sequence se- 
lected from the group consisting of the sequences 
of SEQ ID NOs: 7798-7888. 

11 . A purified nucleic acid encoding a polypeptide com- 
prising a signal peptide included in a sequence se- 
lected from the group consisting of the sequences 
of SEQ ID NOs: 4101-4729 and 7798-7888. 

12. A purified nucleic acid at least 15 nucleotides in 
length which hybridizes under stringent conditions 

*to a sequence selected from the group consisting 
of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 and sequences complementary to the 
sequences of SEQ ID NOs: 24-4100 and SEQ ID 
NOs: 8178-36681. 



18. A method of nnaking a cDNA comprising the steps 
of: 

25 

contacting a collection of mRNA molecules 
from human cells with a primer comprising at 
least 1 5 consecutive nucleotides of a sequence 
selected from the group consisting of the se- 

30 quences complementary to SEQ ID NOs: 

24-4100 and SEQ ID NOs: 8178-36681 ; 
hybridizing said primer to an mRNA in said col- 
lection that encodes said protein; 
reverse transcribing said hybridized primer to 

35 make a first cDNA strand from said mRNA; 

nr«king a second cDNA strand complementary 
to sakj first cDNA strand; and 
isolating the resulting cDNA encoding said pro- 
tein comprising said first cDNA strand and said 

^ second cDNA strand. 

19. A purified cDNA obtainable by the method of Claim 
18. 

45 20. The cDN A of Claim 1 9 wherein sakJ cDNA encodes 
at least a portion of a human polypeptide. 

21. A method of nnaklng a cDNA comprising the steps 
of: 

so 

obtaining a cDNA comprising a sequence se- 
lected from the group consisting of SEQ ID 
NOs: 24-4100 and SEQ ID NOs: 8178-36681 ; 
contacting said cDNA with a detectable probe 
55 comprising at least 1 5 consecutive nucleotides 

of a sequence selected from the group consist- 
ing of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
81 78-36681 and the sequences complementa- 
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ry to SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 under conditions which permit said 
probe to hybridize to said cDNA; 
identifying a cDN A which hybridizes to said de- 
tectable probe; and 

isolating said cDNA which hybridizes to said 
probe. 



sisting of SEQ ID NOs: 24-4100 and SEQ ID 
NOs; 8178-36681, arid a fifth primer, wherein 
said fourth and fifth hybridize to sequences 
within said first PGR product; and 
performing a second polymerase chain reac- 
tion, thereby generating a second PGR prod- 
uct 



22. A purified cDNA obtainable by the method of Glalm 
21. 



10 



28. A purified cDNA obtainable by the method of Glaim 
27. 



23. The cDNA of Claim 22 wherein said cDNA encodes 
at least a portion of a human polypeptide. 



29. The cDNA of Glaim 28 wherein said cDNA encodes 
at least a portion of a human polypeptide. 



24. A method of making a cDNA comprising the steps is 30. The method of Claim 24 wherein the second cDNA 
of: strand is made by: 



contacting a collection of mRNA molecules 
from human cells with a first primer capable of 
hybridizing to the polyA tail of said mRNA; 20 
hybridizing said first primer to said polyA tail; 
reverse transcribing said mRNA to make a first 
cDNA strand; 

making a second cDNA strand complementary 
to said first cDNA strand using at least one ss 
primer comprising at least 1 5 consecutive nu- 
cleotides of a sequence selected from the 
group consisting of SEQ ID NOs: 24-4100 and 
SEQ ID NOs: 8178-36681; and 
isolating the resulting cDNA comprising said 30 
first cDNA strand and said second cDNA 
strand. 

25. A purified cDNA obtainable by the method of Claim 

24. 3S 

26. The cDN A of Claim 25 wherein said cDNA encodes 
at least a portion of a human polypeptide. 

27. The method of C lalm 24. wherein the second cDN A 40 
strand is made by: 

contacting said first cDNA strand with a first pair 
of primers, said first pair of primers comprising 
a second-primer comprising at least 1 5 consec- 4S 
utive nucleotides of a sequence selected from 
the group consisting of SEQ ID NOs: 24-4100 
and SEQ ID NOs: 81 78-36681 and a third prim- 
er having a sequence therein which is included 
within the sequence of said first primer; so 
performing a first polymerase chain reaction 
with said first pair of primers to generate a first 
PGR product; 

contacting said first PGR product with a second 
pair of primers, said second pair of primers ss 
comprising a fourth primer, said fourth primer 
comprising at least 15 consecutive nucleotides 
of said sequence selected from the group con- 



contacting said first cDN A strand with a second 
primer comprising at least 15 consecutive nu- 
cleotides of a sequence selected from the 
group consisting of SEQ ID NOs: 24-4100 and 
SEQ ID NOs: 8178-36681; 
hybridizing said second primer to said first 
strand cDNA; and 

extending said hybridized second primer to 
generate said second cDNA strand. 

31. A purified cDNA obtainable by the method of Glaim 
30. 

32. The cDNA of Glaim 28. wherein said cDNA encodes 
at least a portion of a human polypeptide. 

33. A method of making a polypeptide comprising the 
steps of: 

obtaining a cDNA which encodes a polypeptide 
encoded by a nucleic acid comprising a se- 
quence selected from the group consisting of 
SEQ ID NOs: 24-4100 or a cDNA which en- 
codes a polypeptide comprising at least 1 0 con- 
secutive amino acids of a polypeptkJe encoded 
by a sequence selected from the group consist* 
ing of SEQ ID NOs: 24-41 00; 
inserting said cDNA in an expression vector 
such that said cDNA is operably linked to a pro- 
nrxjter; 

introducing said expresston vector into a host 
cell whereby said host cell produces the protein 
encoded by said cDNA; and 
isolating said protein. 

34. An isolated protein obtainable by the method of 
Claim 33. 

35. A method of obtaining a promoter DNA comprising 
the steps of: 
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obtaining genomic DNA located upstream of a 
nucleic acid comprising a sequence selected 
from the group consisting of SEQ ID NOs: 
24-41 CO and SEQ ID NOs: 81 78-36681 and the 
sequences complementary to the sequences of 5 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681; 

screening said genomic DNA to identify a pro- 
moter capable of directing transcription initia- 
tion; and 10 
isolating said DNA comprising said identified 
promoter. 

36. The method of Claim 35, wherein said obtaining 
step comprises walking from genomic DNA com- is 
prising a sequence selected from the group consist- 
ing of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 and the sequences complementary to 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681. 20 

37. The method of Claim 36, wherein said screening 
step comprises inserting genomic DNA located up- 
stream of a sequence selected from the group con- 
sisting of SEQ ID NOs: 24-41 00 and SEQ ID NOs: zs 
8178-36681 and the sequences complementary to 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 into a promoter reporter vector. 

38. The method of Claim 36, wherein said screening 30 
step comprises identifying motifs in genomic DNA 
located upstream of a sequence selected from the 
group consisting of SEQ ID NOs: 24-4100 and SEQ 

ID NOs: 8178-36681 and the sequences comple- 
mentary to SEQ ID NOs: 24-4100 and SEQ ID NOs: 3S 
8178-36681 which are transcription factor binding 
sites or transcription start sites. 

39. An isolated promoter obtainable by the method of 
any one of Claims 34 to 38. 40 

40. In an array of discrete ESTs or fragments thereof of 
at least 15 nucleotides in length, the improvement 
comprising Inclusion in said array of at least one se- 
quence selected from the group consisting of SEQ 4S 
ID NOs: 24-4100 and SEQ ID NOs: 8178-36681, 

the sequences complementary to the sequences of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 and fragments comprising at least 15 
consecutive nucleotides of said sequence. so 

41. The array of Claim 40 including therein at least two 
sequences selected from the group consisting of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 , the sequences complementary to the ss 
sequences of SEQ ID NOs: 24-4100 and SEQ ID 
NOs: 8178-36681, and fragments comprising at 
least 15 consecutive nucleotides of said sequenc- 



es. 

42. The array of Claim 40 including therein at least five 
sequences selected from the group consisting of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681, the sequences complementary to the 
sequences of SEQ ID NOs: 24-4100 and SEQ ID 
NOs: 8178-36681 and fragments comprising at 
least 15 consecutive nucleotides of said sequenc- 
es. 

43. An enriched population of recombinant nucleic ac- 
ids, said recombinant nucleic acids comprising an 
insert nucleic acid and a backbone nucleic acid, 
wherein at least 5% of said insert nucleic acids in 
said population comprise a sequence selected from 
the group consisting of SEQ ID NOs: 24-4100 and 
SEQ ID NOs: 8178-36681 and the sequences com- 
plementary to SEQ ID NOs: 24-4100 and SEQ ID 
NOs: 8178-36681. 

44. A purified or isolated antibody capable of specifical- 
ly binding to a polypeptide comprising a sequence 
selected from the group consisting of SEQ ID NOs: 
4101-8177. 

45. A purified or isolated antibody capable of specifical- 
ly binding to a polypeptide comprising at least 10 
consecutive amino acids of a sequence selected 
frorh the group consisting of SEQ ID NOs: 
4101-8177. 

46. An antibody composition capable of selectively 
binding to an epitope-containing fragment of a 
polypeptide comprising a contiguous span of at 
least 8 amino acids of any of SEQ ID NOs: 
4101-8177, wherein said antibody is polyclonal or 
monoclonal. 

47. A computer readable medium having stored there- 
on a sequence selected from the group consisting 
of a nucleic acid code of SEQ ID NOs: 24-41 00 and 
8178-36681 and a polypeptkie code of SEQ ID 
NOs: 4101-8177. 

48. A computer system comprising a processor and a 
data storage device wherein said data storage de- 
vice has stored thereon a sequence selected from 
the group consisting of a nucleic acid code of SE- 
QID NOs: 24-4100 and 8178-36681 and a polypep- 
tide code of SEQ ID NOs: 4101-8177. 

49. The computer system of Claim 48 further compris- 
ing a sequence comparer and a data storage device 
having reference sequences stored thereon. 

50. The computer system of Claim 49 wherein said se- 
quence comparer comprises a computer program 
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which indicates polymorphisms. 

51. The computer system of Claim 48 further compris- 
ing an identifier which Identifies features in said se- 
quence. 5 

52; A method for comparing a first sequence to a refer- 
ence sequence wherein said first sequence Is se- 
lected from the group consisting of a nucleic acid 
code of SEQID NOs: 24-4100 and 8178-36681 and 70 
a polypeptide code of SEQ ID NOs: 4101-8177 
comprising the steps of: 



sequentially linking together the amino acids in said 
polypeptides. 

60. A method of making a polypeptide of any one of 
Claims 13 to 17 wherein said polypeptides is 120 
amino acids in length or less comprising the step of 
sequentially linking together the amino acids in said 
polypeptides. 



reading said first sequence and said reference 
sequence through use of a computer program 
which compares sequences; and 
determining differences between said first se- 
quence and said reference sequence with said 
computer program. 

20 

53. The method of Claim 52, wherein sakj step of de- 
termining differences between the first sequence 
and the reference sequence comprises identifying 
polynnorphisms. 

25 

54. A method for identifying a feature in a sequence se- 
lected from the group consisting of a nucleic acid 
code of SEQID NOs: 24-4100 and 8178-36681 and 
a polypeptide code of SEQ ID NOs: 4101-8177 
comprising the steps of: oo 

reading said sequence through the use of a 
computer program which identifies features in 
sequences; and 

identifying features in said sequence with said 3S 
computer program. 

55. A vector comprising a nucleic acid according to any 
one of Claims 1 to 12. 

40 

56. A host cell containing a nucleic acid of Claim 55. 



57. A method of making a nuciek: acid of Claims 1 com- 
prising the steps of: 

introducing said nucleic acid into a host ceil 
such that said nucleic acid is present in multiple 
copies in each host cell; and 
isolating said nucleic acid from said host cell. 

58. A method of making a nucleic acid of any one of 
Claims 1 to 1 2 comprising the step of sequentially 
linking together the nucleotides In said nucleic ac- 
ids. 

59. A method of making a polypeptide of any one of 
Claims 1 3 to 1 7 wherein said polypeptides is 1 50 
amino acids in length or less comprising the step of 
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Minimum 
signal 
peptide score 


T«i5e posiuvc 

rate 


•atse 
negative rate 


proba(0.1} 


proba(OJ!} 


3.5 


0,121 


0.036 


0.467 


0.664 


4 


0.096 


0.06 


0.519 


0.708 


4.5 


0.078 


0,079 


0.565 


0.745 


5 


0.062 


0.098 


0.615 


0.782 


5.5 


0.05 


0.127 


0,659 


0,813 


6 


0.04 


0.163 


0.694 


0.836 


6,5 


0.033 


0.202 


0.72S 


0,855 


7 


0.025 


0.248 


0.763 


0.878 


7.5 


0.021 


0.304 


0.78 


•0.889 


8 


0.015 


0.368 


0.816 


0.909 


8.5 


0.012 


0,418 


0.836 


0.92 


9 


0.009 


0.512 


0.856 


0.93 


9.5 


0.007 


0.581 


0.863 


0.934 


10 


0.006 


0.679 


0.835 


0.919 
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Description of Transcription Factor Binding Sites present on promoters isolated from 
SignalTag sequences 



Promoter sequence P13H2 (546 bp): 
Matrix 



CMYB 01 
MY0D"Q6 
S8 01 " 
S8"01 

DELTAEFI 01 
GATA C 
CMYB" 01 
GATA r 02 
GATA C 

TAL1ALPHAE47 0! 
TALIBETA£47 01 
TALlBETAnT2_0l 
MYOD 06 
GATA r 04 
IKl 01 " 
IK2'01 
CREl 01 

gataI 02 

SRY 02* 
E2F_02 
MZFl 01 



Matnx 

NFY_Q6 
MZFl 01 
CMYB 01 
VMYB"02 
STAT 01 
STAT'Ol 
MZFfOl 

na oT 

MZFl 01 
SRY_02 
MZF! 01 
MY06 Q6 
DELTAEFI_01 
S8_01 
MZFl 01 



Matrix 

ARNT 01 
NMYC 01 
USF of 
USF'OI 
NMYC.Ol 
MYCMAX 02 
USF C 
VSFJZ 
MZFl Ot 
ELKr02 
CETSTP54 01 
API Q4 " 
APIFJ Q2 
PADS*C 



Position 


Orientation 


Score 


Length 


Sequenee 


-502 




0.983 


9 


TGTCAGTTG 


-501 




0.961 


10 


CCCA actga c 


^44 


- 


0.960 


11 


aatagaattag 


-425 


+ 


0.966 


11 


AACTAAATTAG 


-390 




0 960 


)] 


gcacacctcag 


-364 




0-964 


U 


AGATAAATCCA 


-349 




0.958 


9 


CTTCAOTTG 


-343 




0959 


14 


ttgtagataggaca 


-339 




0.953 


11 


agataggacat 


-235 




0.973 


16 


cataacagatggtaag 


-235 




0.983 


16 


CATAACAGATGGTAAG 


-235 


+ 


0.978 


16 


cataacagatggtaag 


-232 




0.954 


10 


ACCATCTGTT 


-217 




0.953 


13 


tcaagataaagta 


-126 




0.963 


13 


AGTTOGGAATTCC 


-126 




0.985 


12 


AGTTGGGAATTC 


-123 


+ 


0.962 


10 


TGGGAATTCC 


-96 


+ 


0.950 


14 


TCAGTGATATGGCA 


-41 




0.951 


12 


TAAAACAAAACA 


-33 


+ 


0.957 


8 


TTTAGCGC 


-5 




0.975 


g 




B4 (861 bp): 










Position Orientation 


ovorc 




Se<|ucocc 


-748 




0.956 


It 


GGACCAATCAT 


-738 




0.962 


8 


CCTGGGGA 


•684 


+ 


0.994 


9 


TGACCGTTG 


-682 




0.985 


9 


TCCAACGGT 


-673 


+ 


0.968 


9 


TTCCTGGAA 


-673 




0.951 


9 


TTCCAGGAA 


-556 




0.956 


8 


TTGGGGGA 


-451 




0.965 


12 


GAATGGGATTTC 


-424 


+ 


0.986 


8 


AGAGGGGA 


-398 




0.955 


12 


GAAAACAAAACA 


-216 




0.960 


8 


GAAGGGGA 


-190 


+ 


0.981 


10 


AGCATCTGCC 


-176 


+ 


0.958 


11 


TCCCACCTTCC 


5 




0.992 


11 


GAOQC^ATTAT 


16 




0.986 


8 


AGAGGGGA 


B6 (555 bp): 










Podtkm Orientation 


Score 


Length 


Sequence 


-3M 




0.964 


16 


GGACTCACGTOCTGCT 


-309 


+ 


0.965 


12 


ACTCACGTGCTG 


-309 




0.985 


12 


ACTCACGTGCTG 


-309 




0.985 


12 


CAGCACGTGAOT 


-309 




0.956 


12 


CAGCACGTGAGT 


-309 




0.972 


12 


CAGCACGTGAGT 


-307 




0.997 


8 


tcacgtcx: 


-307 




0.991 


8 


GCACGTGA 


-292 




0.968 


8 


CATGGGGA 


-105 




0.963 


14 


CTCTCCGGAAGCCT 


-102 




0.974 


10 


TCCGGAAGCC 


-42 




0.963 


11 


ACjTGACTGAAC 


-42 




0.961 


It 


AGTGACTGAAC 


45 




1. 000 


9 


TCTGGTCrC 



Location in: 
SEQJDNO: 17 
17-25 

complement of 18-27 
compiement of 75-85 
94-104 

complement of 129-139 

complement of 1 55- 1 65 

170-178 

176-189 

180-190 

284-299 

284.299 

284-299 

complement of 287-296 

compkmeni of 302-3 14 

393-405 

393-404 

396-405 

423-436 

compiement of 478-489 
486-493 

compiement of 5 1 4-52 1 



Location in: 
SEQ ID NO: 20 
complement of 60-70 
70-77 
124-132 

complement of 1 26-1 34 
135-143 

complement of 135-143 
complement of 252-259 
357-368 
384-391 

comp tement of 4 1 0-42 1 

592-599 

618-627 

632-642 

complement of 8 1 3-823 
complement of 824-83 1 



Location in: 
SEQ ID NO: 23 
191-206 
193-204 
193-204 

complement of 193-204 
complement of 193-204 
complemem of 193-204 
195-202 

complement of 195-202 
complement of 21 0-2 17 
397-410 
400-409 

complement of 460-470 
complemenl of 460^70 
547-555 



HGURES 



71 



